R read files with for loop - r

I just want to use use 10 files in R. For each I want to calculate something.
Exp. file:
stat1_pwg1.out
stat23_pwg2.out
..
stat45_pwg10.out
I try this:
for (i in 1:10){
Data=paste("../XYZ/*_pwg",i,".out",sep="")
line=read.table(Data,head=T)
}
But it does not work? Any hinds?

I suspect your problem comes from the wildcard *. A better way to do this might be to first store the file names using dir, then find the ones you want.
files <- dir("../XYZ",pattern="stat[0-9]+_pwg[0-9]+\.out")
for(f in files) {
line=read.table(Data,head=T)
}
You could also use one of the apply family of functions to eliminate the for loop entirely.

A few things about your code.
paste is vectorised, so you can take it out of the loop.
paste("../XYZ/*_pwg", 1:10, ".out", sep = "")
(Though as you'll see in a moment, you don't actually need to use paste at all.)
read.table won't accept wildcards; it needs an exact match on the file name.
Rather than trying to construct a vector of the filenames, you might be better using dir to find the files that exist in your directory, filtered by a suitable naming scheme.
To filter the files, you use a regular expression in the pattern argument. You can convert from wildcards to regular expression using glob2rx.
file_names <- dir("../XYZ", pattern = glob2rx("stat*_pwg*.out"))
data_list <- lapply(filenames, read.table, header = TRUE)
For a slightly more specific fit, where the wildcard only matches numbers than anything, you need to use regular expressions directly.
file_names <- dir("../XYZ", pattern = "^stat[[:digit:]]+_pwg[[:digit:]]+\\.out$")

files <- dir(pattern="*Rip1_*")
files
for (F in files){ assign(F , Readfunc(F))}

Related

Selectively reading in csv files in R where filenames don't end in a particular suffix

I have a folder of csv files from an experiment that I want to read into R for analysis. The folder contains two files for every participant, with filenames following the pattern:
"participantID.csv"
"participantID_debrief.csv"
I want to create two variables in R, one for the standard data files and one for the debrief files. I have found the list.files function and see that the standard way to use this would be like:
files <- list.files(path="D:/data", pattern=".csv")
But I want to use the pattern parameter to match first only the filenames that don't end in "_debrief.csv" and then only the ones that do. How would I write the regular expression (assuming that's what pattern is) to achieve this?
Try:
files = list.files(path="D:/data")
non_debrief = files[!grepl("_debrief.csv", files)]
debrief = files[grepl("_debrief.csv", files)]
For a tidyverse approach, you could use the fs library (https://www.tidyverse.org/blog/2018/01/fs-1.0.0/).
base_dir = 'D:/data/'
file_list_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$')
file_list_non_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$', invert = TRUE)

A copying nightmare, choosing files to copy based on files in another folder

I have a bit of an issue with using copy.file.
I need to copy .tif files from a directory with several subdirectories (where the .tif files are) based on names of those in another file directory. I have the following code (which is almost working)
ValidatedDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Validated"
RawDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Raw"
OutputDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Ouputfolder"
ValidatedImages <- list.files(ValidatedDirectory)
# this is to remove the extra bit that is added onto the validated images [working]
pattern <- gsub("_hc", "", ValidatedImages)
pattern <- paste(gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", pattern), collapse="|")
# this bit tackles finding the relevant files based on the ValidatedImages
filesinRAW <- list.files(
path = RawDirectory,
recursive = TRUE,
include.dirs = FALSE,
full.names = FALSE)
filesinRAW <- as.list(filesinRAW)
# this removes subdirectory prefix in front of the file and .tif which confuses it
filesinRAW <- as.list(gsub("\\d\\d\\d\\d/", "", filesinRAW))
filesinRaw <- as.list(gsub(".tif", "", filesinRAW))
tocopy <- grep(filesinRAW, pattern = pattern, value = TRUE)
tocopy <- as.list(tocopy)
tocopy <- as.list(gsub(".tif", "", tocopy))
setwd(RawDirectory)
file.copy(from = tocopy, to = OutputDirectory, overwrite = TRUE)
I get the No such file or directory error, the files do exist (obviously), thus I must be doing something wrong with the naming.
I have been having a bash at it for a good while, if helpful I can upload the example data and share the link.
Thanks for any help community!
When debugging, try to break down your code to see if at each step your variables are still as you're expect them.
That said, I see several problems in your code right now:
grep works with pattern being a length-one regular expression. If you give it multiple regular expressions, it uses the first one (with a warning, which you don't see if you've disabled them).
To use multiple matches, you can use apply and sapply: filesinRAW[apply(sapply(pattern, grepl, x=filesinRAW), 2, any)]. But see the last point
grep by default uses pattern as a regular expression, which may break things if your pattern contains characters that are parsed. For example, grep('^test', '^test') gives zero results. To check if a string contains a literal string, you can use grep(..., fixed=TRUE)
In the last step, you use sub(".tif", "", to copy), which will remove any patterns like .tif. I suppose you meant to add .tif again at the end, right now you are trying to copy files without an extension, which won't be found. To add, you can use paste.
In several steps you use as.list. Why? In R, everything is vectorised, meaning multiple values are already used. The difference between a list and a vector is that lists can store different kinds of objects, but you're not doing that anyway. As far as I can see, the as.lists don't harm anything, because all the functions will as a first step convert your list back to a character-vector.
Finally, as far I can see you're first making a list of filenames that need to be copied (pattern), that you then compare to a full list of your files. And you try to make them match exactly. Then why use a regular expression? Regular expressions are useful if you just know a part of what your filenames look like, but is that your goal. E.g. if filename1._hc is in your ValidatedDirectory, do the files filename11.tif and filename12.tif need to be copied as well?
If you're just looking for exact matches, you can directly compare them:
tocopy <- tocopy[tocopy %in% pattern]
But generally, working in R is easy because you can do everything step-by-step, and if you just inspect tocopy, you can see whether your call makes sense.
After much help from #Emil Bode I have the following solution to the issue (perhaps not the most elegant, but it runs quick enough on 1000s of .tif files.
ValidatedDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Validated"
RawDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Raw"
OutputDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Ouputfolder"
ValidatedImages <- list.files(ValidatedDirectory)
pattern <- gsub("_hc", "", ValidatedImages)
pattern <- paste(gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", pattern), collapse="|")
filesinRAW <- list.files(
path = RawDirectory,
recursive = TRUE,
include.dirs = FALSE,
full.names = FALSE,
pattern = pattern)
setwd(RawDirectory)
file.copy(from = filesinRAW, to = OutputDirectory, overwrite = TRUE)

Find files with R console

there is the list.file function and a pattern in the function. However, how can i find things in a fold if it has the ending yto and within the name there is a 111 (it has to be before the dot)?
As #Roland writes, you can use the list.files function. Run ?list.files for the documentation hereof.
Say we wish to list all .txt files in foldes (and subfolders). Then something like
list.files(pattern = "\\.txt$", recursive = TRUE)
would do the trick. The recursive argument makes the function search subfolders as well. The pattern we're looking for the regular expression "\\.txt$" meaning that the filename should end with .txt. Consult ?regex for more information on regular expressions.
EDIT: If you search for files which ends in 111.tx you then need to modify the above to:
list.files(pattern = "111\\.tx$", recursive = TRUE)

Automatically using the object name as file name with write.table or write.csv

Is there a way to have the object name become the file name character string when using write.table or write.csv?
In the following, a and b are vectors. I will be doing similar comparisons for many other pairs of vectors, and would like to not write out the object name as many times as I have been doing.
unique_downa<-a[!(a%in%b)]
write.csv(unique_downa,file="unique_downa.csv")
Or if anyone has a suggestion for a better way to do this whole process, I'd be happy to hear it.
The idiomatic approach is to use deparse(substitute(blah))
eg
write.csv.named <- function(x, ...){
fname <- sprintf('%s.csv',deparse(substitute(x)))
write.csv(x=x, file = fname, ...)
}
It might be easiest to use the names of elements of a list instead of trying to use object names:
mycomparisons <-list (unique_downa = a[!(a%in%b)], unique_downb = b[!(b%in%a)])
mapply (write.csv, mycomparisons, paste (names (mycomparisons), ".csv", sep =""))
The best thing to do is probably put your vectors in a list, and then do the comparisons, the naming, and the writing out all inside the same loop, but that depends on how similar these similar comparisons are...

Using R to list all files with a specified extension

I'm very new to R and am working on updating an R script to iterate through a series of .dbf tables created using ArcGIS and produce a series of graphs.
I have a directory, C:\Scratch, that will contain all of my .dbf files. However, when ArcGIS creates these tables, it also includes a .dbf.xml file. I want to remove these .dbf.xml files from my file list and thus my iteration. I've tried searching and experimenting with regular expressions to no avail. This is the basic expression I'm using (Excluding all of the various experimentation):
files <- list.files(pattern = "dbf")
Can anyone give me some direction?
files <- list.files(pattern = "\\.dbf$")
$ at the end means that this is end of string. "dbf$" will work too, but adding \\. (. is special character in regular expressions so you need to escape it) ensure that you match only files with extension .dbf (in case you have e.g. .adbf files).
Try this which uses globs rather than regular expressions so it will only pick out the file names that end in .dbf
filenames <- Sys.glob("*.dbf")
Peg the pattern to find "\\.dbf" at the end of the string using the $ character:
list.files(pattern = "\\.dbf$")
Gives you the list of files with full path:
Sys.glob(file.path(file_dir, "*.dbf")) ## file_dir = file containing directory
I am not very good in using sophisticated regular expressions, so I'd do such task in the following way:
files <- list.files()
dbf.files <- files[-grep(".xml", files, fixed=T)]
First line just lists all files from working dir. Second one drops everything containing ".xml" (grep returns indices of such strings in 'files' vector; subsetting with negative indices removes corresponding entries from vector).
"fixed" argument for grep function is just my whim, as I usually want it to peform crude pattern matching without Perl-style fancy regexprs, which may cause surprise for me.
I'm aware that such solution simply reflects drawbacks in my education, but for a novice it may be useful =) at least it's easy.

Resources