How to loop over each file in multiple directories in R - r

I have an object called wanted.bam with the list of wanted file names for all the .bam (is the extension) files in three of my directories path1,path2,path3. I am looping over all these directories to search for the wanted files. What I am trying to do is look for wanted files by looping over each directory and implement a FUNCTION in each file. This loop works for all the matched file in the first directory, but as it progresses to another directory, it breaks giving an error:
Error in value[[3L]](cond) :
failed to open BamFile: file(s) do not exist:
'sort.bam'
my code:
bam.dir<- c("path1","path2","path3")
for (j in 1:length(bam.dir)){
all.bam.files <- list.files(bam.dir[j])
all.bam.files <- grep(wanted.names, all.bam.files, value=TRUE)
print(paste("The wanted number of bam files in this directory:", (length(all.bam.files))))
if(length(all.bam.files)==0){
next
}else{
setwd(bam.dir[j])
}
print(paste("The working directory number:",j,":",(getwd())))
## ****using another loop here for each file to implement a function*****
all.FAD<- {}
for(i in 1:length(all.bam.files)){
output<- FUNCTION(all.bam.files[i])
}
}

You probably don't want to be changing working directory like this. Instead, use the option in list.files, full.names=TRUE, to return the full path of your files. Then, you can just use read.csv, or whatever, on the full path name without need to change directory. Your code is failing because after you set directory, the relative path to the next directory is changed.
If you want to keep changing directories, just make sure you set the directory back to the base directory at the end of the loop.

Related

Running a same script in sub folders

In my main folder i have many sub folders like AA,BB,CC,DD ...etc. and all folders have a common script named run_script.R and i want to run this script in every folder. folder can be any amount.
Its working abut running in first folder only ,but i wanted it to run in every folder.
also when i am using setwd(folder) then showing error
Error in setwd(folder) : cannot change working directory
data_folder <- "C:/Users/mosho/Desktop/New folder (2)/"
allfolders <- data.frame(Folders = list.dirs(path = data_folder, recursive = F, full.names = F))
r_scripts <- "run_script.R"
for (folder in allfolders$Folders) {
#setwd(folder)
message(folder)
source(paste0(data_folder,folder,"/",r_scripts))
}
You are on a right path, I did some minor tweaks to your script which will resolve the issue. The points missing in your scripts are;
the allfolders contains the folder name not the entire explicit path. To set the working directory you need to set give the explicit path, by only calling the folder name will result into error unless you existing working directory is contains that folder. Anyways, its best practice to work with full path names.
also to simplify setting up allfolders as list for iterator will make your life lot easier than a data frame
Below is my work-out;
I created some dummy folders (DIC01, DIC02, DIC03...) under path "C:\Users\XXXXXX\Documents\TEST MAIN", and placed code run_script.R inside each one. This run_script.R contains simple code print("Hello World !!")
Next I set initial working directory where to the path where all the folders present i.e. to path "C:\Users\XXXXXX\Documents\TEST MAIN". Next listed the folders/directories present within this path as a list instead of data frame. Next is for loop which iterate over list of folder names. Inside we reset the working directory by the folder name and source the R code.
data_folder <- "C:\\Users\\XXXXXX\\Documents\\TEST MAIN"
setwd(data_folder)
allfolders <- list.dirs(path = data_folder, recursive = F, full.names = F)
r_scripts <- "run_script.R"
for (folder in allfolders) {
print(folder)
setwd(paste0(data_folder,"\\",folder))
source(paste0(data_folder,"\\",folder,"\\",r_scripts))
}
The result I get after the execution is something like this. First the name of the directory and then execution result.
I hope this resolves you problem. If yes Like/Up vote the answer and let me know.

loop in all files in a directory with r script

I have a directory which contains different subfolders and other files. I need to access each subfolder, read the .tsv file and carry out the following rscript. How to loop this rscript and run it from the terminal?
for(i in my_files){
s <- read.csv('abundance.tsv',sep = '\t')
colnames(compare)[1] <- 'target_id'
colnames(s)[1] <- 'target_id'
s1 <- merge(compare, s, by = "target_id")
output.filename <- gsub("(.*?)", "\\1.csv", i)
write.table(s1, output.filename)
}
list.dirs() returns a list of the directories in the given path and list.files() a list of files in a given path, see here for the documentation.
list.dirs() can be recursive or not, so you can get only directory at the first level and then call list.dirs() again on each sub-directories (inside a loop) or directly get all the sub-directories.
With these two functions you can build your my_files array (since I do not know exactly your directory structure, I can't give an example).
If you have multiples files and want to open only some of them, you can check if the file name contains some sub-string you want (e.g. the file extension). The way to do it is shown here.

How to file copy from a column of files in a data table in R

I am trying to create a new folder with all the files I need. I have a datatable called newlist with a column called fullpath which has the file path for each file.
I have tried the code below but the error message says "'from' path too long" so I don't think it recognises the values as separate file paths.
file.copy(from=newlist[,"fullpath"], to=destination, overwrite=TRUE, recursive=TRUE)
I think I need to specify the files to copy first using the list.files() function but I am unsure how to do this with a column of files in a datatable.
Try this:
lapply(newlist[,"fullpath"], function(x) file.copy(from=x, to=destination, overwrite=TRUE, recursive=TRUE))
Edit:
If your paths do not contain extensions, you can use this code instead:
lapply(newlist[,"fullpath"], function(x) {
# Find the file in the given directory with the basename and add a wildcard extension
f <- file.path(dirname(x), list.files(dirname(x), paste0(basename(x), ".*")))
file.copy(from=f, to=destination, overwrite=TRUE, recursive=TRUE)
})

How can I copy files with duplicate filenames into one directory and retain both files by having the duplicate(s) rename automatically in R?

I need to copy files from multiple folders into one folder, but there are multiple duplicates and I need to keep those. Is there a way to copy files with duplicate filenames into one directory and retain both files by having the duplicate(s) rename automatically in R?
The code I'm using:
my_dirs <- list.dirs("C:/desktop/")
library(plyr)
files<-sapply(my_dirs,list.files,full.names=TRUE,pattern=".xlsx")
new_dir<-"C:/desktop/new folder/"
for(file in files) {
file.copy(file, new_dir)
}
You can probably use file.rename instead. I believe this code should work, but haven't tested it.
for(i in seq_along(files)) {
file.rename(files[i], paste0(new_dir, "file_", i, basename(files[i])))
}
The second argument to file.rename pastes the new file path to the file name prepended with "file_", and a counter. basename strips the original file path and returns only the name of the file. With the addition of the counter, you can be sure that none of the files have the same name.

Looping through folder and finding specific file in R

I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.

Resources