I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.
Related
In my main folder i have many sub folders like AA,BB,CC,DD ...etc. and all folders have a common script named run_script.R and i want to run this script in every folder. folder can be any amount.
Its working abut running in first folder only ,but i wanted it to run in every folder.
also when i am using setwd(folder) then showing error
Error in setwd(folder) : cannot change working directory
data_folder <- "C:/Users/mosho/Desktop/New folder (2)/"
allfolders <- data.frame(Folders = list.dirs(path = data_folder, recursive = F, full.names = F))
r_scripts <- "run_script.R"
for (folder in allfolders$Folders) {
#setwd(folder)
message(folder)
source(paste0(data_folder,folder,"/",r_scripts))
}
You are on a right path, I did some minor tweaks to your script which will resolve the issue. The points missing in your scripts are;
the allfolders contains the folder name not the entire explicit path. To set the working directory you need to set give the explicit path, by only calling the folder name will result into error unless you existing working directory is contains that folder. Anyways, its best practice to work with full path names.
also to simplify setting up allfolders as list for iterator will make your life lot easier than a data frame
Below is my work-out;
I created some dummy folders (DIC01, DIC02, DIC03...) under path "C:\Users\XXXXXX\Documents\TEST MAIN", and placed code run_script.R inside each one. This run_script.R contains simple code print("Hello World !!")
Next I set initial working directory where to the path where all the folders present i.e. to path "C:\Users\XXXXXX\Documents\TEST MAIN". Next listed the folders/directories present within this path as a list instead of data frame. Next is for loop which iterate over list of folder names. Inside we reset the working directory by the folder name and source the R code.
data_folder <- "C:\\Users\\XXXXXX\\Documents\\TEST MAIN"
setwd(data_folder)
allfolders <- list.dirs(path = data_folder, recursive = F, full.names = F)
r_scripts <- "run_script.R"
for (folder in allfolders) {
print(folder)
setwd(paste0(data_folder,"\\",folder))
source(paste0(data_folder,"\\",folder,"\\",r_scripts))
}
The result I get after the execution is something like this. First the name of the directory and then execution result.
I hope this resolves you problem. If yes Like/Up vote the answer and let me know.
I have a directory which contains different subfolders and other files. I need to access each subfolder, read the .tsv file and carry out the following rscript. How to loop this rscript and run it from the terminal?
for(i in my_files){
s <- read.csv('abundance.tsv',sep = '\t')
colnames(compare)[1] <- 'target_id'
colnames(s)[1] <- 'target_id'
s1 <- merge(compare, s, by = "target_id")
output.filename <- gsub("(.*?)", "\\1.csv", i)
write.table(s1, output.filename)
}
list.dirs() returns a list of the directories in the given path and list.files() a list of files in a given path, see here for the documentation.
list.dirs() can be recursive or not, so you can get only directory at the first level and then call list.dirs() again on each sub-directories (inside a loop) or directly get all the sub-directories.
With these two functions you can build your my_files array (since I do not know exactly your directory structure, I can't give an example).
If you have multiples files and want to open only some of them, you can check if the file name contains some sub-string you want (e.g. the file extension). The way to do it is shown here.
I am trying to create a new folder with all the files I need. I have a datatable called newlist with a column called fullpath which has the file path for each file.
I have tried the code below but the error message says "'from' path too long" so I don't think it recognises the values as separate file paths.
file.copy(from=newlist[,"fullpath"], to=destination, overwrite=TRUE, recursive=TRUE)
I think I need to specify the files to copy first using the list.files() function but I am unsure how to do this with a column of files in a datatable.
Try this:
lapply(newlist[,"fullpath"], function(x) file.copy(from=x, to=destination, overwrite=TRUE, recursive=TRUE))
Edit:
If your paths do not contain extensions, you can use this code instead:
lapply(newlist[,"fullpath"], function(x) {
# Find the file in the given directory with the basename and add a wildcard extension
f <- file.path(dirname(x), list.files(dirname(x), paste0(basename(x), ".*")))
file.copy(from=f, to=destination, overwrite=TRUE, recursive=TRUE)
})
I have a folder Tmin which contains 18 folders. Each of the 18 folders contains hundreds of file. I would like to create a program with R that allow to add the name of the folder files for each file. I do not want to rename each of the file with a different name, I only want to add the folder name at the beginning of the file name. I am new in R and in programming. I was not able to have a batch function that can repeat the operation for each folder. You can find attached two pictures, which show what I would like to obtain.
For example, the file called "name_date.tiff" contained in the folder "MACA_Miroc" will become "MACA_Miroc_name_date.tiff". Moreover, I would like to repeat the operation automatically for each folder. Thanks in advance for any help!
Wanted situation and organization of my folders and file
This ought to work:
mydir <- getwd()
primary_folder <- "C:/Users/Desktop/Test_Data/"
subfolders <- grep("*MACA*", list.dirs(primary_folder, full.names = T, recursive = F),
value = T)
renameFunc <- function(z){
setwd(z)
fnames <- dir(recursive = F, pattern= ".tiff|.csv")
addname <- substr(z, nchar(primary_folder)+2, nchar(z))
lapply(fnames, function(current_name){
#Regex to get extension, may need to addd $ sign to signify end of file name
ptrn <- ".*\\.([a-zA-Z]{2,4})"
extension <- regmatches(current_name, regexec(ptrn, current_name))[[1]][2]
no_extension <- gsub(paste(".",extension, sep = ""), "", current_name)
new_name <- paste(gsub("_"," ", no_extension), " ", addname, ".", extension, sep = "")
file.rename(current_name, new_name)
})
}
lapply(subfolders, readFunc)
setwd(mydir)
I think if you're not in the directory where you want to change file names, you must specify the full name. Changing the working directory was a quick way but you could use full names (using regular expressions to get the correct from and to values for file.rename()). I got some errors at one poing when I was not in the directory where I wanted to change the name.
I feel this allows more control over which folders you want to change the names in since incorrect operation can be very messy. You may also want to skip some file extensions or subfolders etc.
Your path folder
folder<-"C:/path/example/"
Extract files list
files<-list.files(folder)
Extract folder name
folder_name<-unlist(strsplit(folder,"/"))[length(unlist(strsplit(folder,"/")))]
Rename all files
file.rename(from = paste0(folder,files),to = paste0(folder,folder_name,"_",files))
I have an object called wanted.bam with the list of wanted file names for all the .bam (is the extension) files in three of my directories path1,path2,path3. I am looping over all these directories to search for the wanted files. What I am trying to do is look for wanted files by looping over each directory and implement a FUNCTION in each file. This loop works for all the matched file in the first directory, but as it progresses to another directory, it breaks giving an error:
Error in value[[3L]](cond) :
failed to open BamFile: file(s) do not exist:
'sort.bam'
my code:
bam.dir<- c("path1","path2","path3")
for (j in 1:length(bam.dir)){
all.bam.files <- list.files(bam.dir[j])
all.bam.files <- grep(wanted.names, all.bam.files, value=TRUE)
print(paste("The wanted number of bam files in this directory:", (length(all.bam.files))))
if(length(all.bam.files)==0){
next
}else{
setwd(bam.dir[j])
}
print(paste("The working directory number:",j,":",(getwd())))
## ****using another loop here for each file to implement a function*****
all.FAD<- {}
for(i in 1:length(all.bam.files)){
output<- FUNCTION(all.bam.files[i])
}
}
You probably don't want to be changing working directory like this. Instead, use the option in list.files, full.names=TRUE, to return the full path of your files. Then, you can just use read.csv, or whatever, on the full path name without need to change directory. Your code is failing because after you set directory, the relative path to the next directory is changed.
If you want to keep changing directories, just make sure you set the directory back to the base directory at the end of the loop.