R loop through subfolders - r

i have a parent folder containing many sub-folders. I need to read a file in each sub-folder. the file name is Aligned.bam which is identical across all sub-folders. is there a way I can generate all the file paths for this file. so that I can write a loop function to read each bam file.
parent.folder <- '/N/slate/ATACseq'
subfolders <- list.dirs(parent.folder, recursive=TRUE)[-1]
subfolders/Aligned.bam--------->need something like this.
how to achieve this in R?

You could try doing:
Files <- list.files(path = "full_path_to_parent_folder", pattern = "Aligned.bam$", recursive = TRUE, full.names = TRUE)
This would output the paths to all Aligned.bam in your desired parent folder set in path.

Related

Read all files in specific folder in R

I am trying to read all files in a specific sub-folder of the wd. I have been able to add a for loop successfully, but the loop only looks at files within the wd. I thought the command line:
directory <- 'folder.I.want.to.look.in'
would enable this but the script still only looks in the wd. However, the above command does help create a list of the correct files. I have included the script below that I have written but not sure what I need to modify to aim it at a specific sub-folder.
directory <- 'folder.I.want.to.look.in'
files <- list.files(path = directory)
out_file <- read_excel("file.to.be.used.in.output", col_names = TRUE)
for (filename in files){
show(filename)
filepath <- paste0(filename)
## Import data
data <- read_excel(filepath, skip = 8, col_names = TRUE)
data <- data[, -c(6:8)]
further script
}
The further script is irrelevant to this question and works fine. I just can't get the loop to look over each file in files from directory. Many thanks in advance
Set your base directory, and then use it to create a vector of all the files with list.files, e.g.:
base_dir <- 'path/to/my/working/directory'
all_files <- paste0(base_dir, list.files(base_dir, recursive = TRUE))
Then just loop over all_files. By default, list.files has recursive = FALSE, i.e., it will only get the files and directory names of the directory you specify, rather than going into each subfolder. Setting recursive = TRUE will return the full filepath excluding your base directory, which is why we concatenate it with base_dir.

Copy files from folders and sub-folders to another folder and saving the structure of the folders

I have created a list of files by some conditions and I want to copy only the files from that list to a new folder and subfolders like in the origin folder.
The structure of the folders is year/month/day.
This is the code I tried:
from.dir <- "J:/Radar_data/Beit_Dagan/RAW/2018"
## I want only the files from the night
to.dir <- "J:/Radar_data/Beit_Dagan/night"
files <- list.files(path = from.dir, full.names = TRUE, recursive =
TRUE)
## night_files is a vector I created with the files I need - only during the night
for (f in night_files) file.copy(from = f, to = to.dir)
But I get all the files in one folder
part of my list look like this:
[1] "J:/Radar_data/Beit_Dagan/H5/2018/03/10/TLV180310142554.h5"
[2] "J:/Radar_data/Beit_Dagan/H5/2018/03/10/TLV180310142749.h5"
[3] "J:/Radar_data/Beit_Dagan/H5/2018/03/10/TLV180310143054.h5"
Is there a way to keep the structure of the folder and the subfolders when copying?
I want to get the same structure of year/month/day in the new "night" folder
You need to use the flag recursive = T inside the copy call, so you don't really need to loop inside the dir.
from = paste0(getwd(),"/output/","output_1")
to = paste0(getwd(),"/output/","output_1_copy")
file.copy(from, to, recursive = T)
Note that you need to create the /output_1_copy directory previously to the call. Yo can do it manually or using dir.create(...).
You just need:
file.copy(from = from.dir, to = to.dir,recursive=T)

copy csv file from multiple directories to a new one in R

I am trying to extract many .csv files from multiple directories/subdirectories and copy them in a new folder, where I would like to end up with only .csv files.
The csv files are stored in subdirectories with the following structure:
D:\R data\main_folder\03\07\04\BBB_0120180307031414614.csv
D:\R data\main_folder\03\07\05\BBB_0120180307031414615.csv
I am trying the list.files function to extract the csv files names only.
my_dirs <- list.files("D:\\R data\\main_folder\\",pattern="\\.csv$" ,recursive = TRUE,
include.dirs = FALSE,full.names = FALSE)
The problem is that csv files are listed with the directory path, e.g.
03/07/03/BBB_0120180307031414614.csv
And this, even though full.names and include.dirs is set to FALSE.
This prevents me from copying those files in a new folder, as the name is not recognized.
What am I doing wrong?
Thanks
Use basename function coupled with list.files like below.
If I understood you correctly then you want to fetch the names of .csv files present in different directory.
I have made a temp folder in my documents directory of windows machine , Inside that I have two folders "one" and "two", Inside these folders I have csv files named as "just_one.csv" and "just_two.csv".
So If I want to fetch the names "just_one.csv" and "just_two.csv" then I could do this:
basename(list.files("C:/Users/C_Nfdl_99878314/Documents/temp", "*.csv", recursive=T))
Which results to:
[1] "just_one.csv" "just_two.csv"

Creating a list of files from a list of directories in R

I have a list of many directories, each of which have 5 files inside, from these files within each directory I want to select one (for example let us say the one with extension .txt) and compile a list of these .txt files....how do I create a loop that selects txt files from a list of directories in R?
You can do:
dir(path = ".", pattern = "\\.txt$", full.names = TRUE, recursive = TRUE)
Where path is the root that contains all the folders you want to look up, pattern is regular expression that matches the files you are interested in (in the example all the files with the .txt extension, full.names return the full path of the files, and recursiveto explore all the subfoders in path. This returns a vector with the full path for the files that match your query.
list.files is a vectorized function already, so you can pass the vector of directories to it, no loop needed.
my_dirs <- c("foo/bar", "foo/baz")
all_text_files <- list.files(my_dirs, pattern = "\\.txt$", full.names = TRUE)
If you want a list separating files by directory...
split(all_text_files, dirname(all_text_files))
If you have the list of directory names in dirs,
you can get the .txt files in all of them as a vector with:
files <- unlist(lapply(dirs, function(dir) list.files(path = dir, pattern = '\\.txt$')))
You can achieve the same using a loop as you asked,
but it's less elegant, and I don't recommend it:
files <- c()
for (dir in dirs) {
files <- c(files, list.files(path = dir, pattern = '\\.txt$'))
}

Looping through folder and finding specific file in R

I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.

Resources