loop in all files in a directory with r script - r

I have a directory which contains different subfolders and other files. I need to access each subfolder, read the .tsv file and carry out the following rscript. How to loop this rscript and run it from the terminal?
for(i in my_files){
s <- read.csv('abundance.tsv',sep = '\t')
colnames(compare)[1] <- 'target_id'
colnames(s)[1] <- 'target_id'
s1 <- merge(compare, s, by = "target_id")
output.filename <- gsub("(.*?)", "\\1.csv", i)
write.table(s1, output.filename)
}

list.dirs() returns a list of the directories in the given path and list.files() a list of files in a given path, see here for the documentation.
list.dirs() can be recursive or not, so you can get only directory at the first level and then call list.dirs() again on each sub-directories (inside a loop) or directly get all the sub-directories.
With these two functions you can build your my_files array (since I do not know exactly your directory structure, I can't give an example).
If you have multiples files and want to open only some of them, you can check if the file name contains some sub-string you want (e.g. the file extension). The way to do it is shown here.

Related

How do I read thru multiple files in different folders and store them seperately base on the folder from which they've been retrieved?

The main idea is that I have two folders/paths now in my local machine. In each folder, I have multiple csv files files I want to read into my R. However, instead of appending them all together into one files I want all folder1 files being in file1 and all folder2 files being in file2. I only know how to append them all together, but not know how to append them into two separate files. Below are my code so far.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data<-rbind(data,indata)}
}
So far, I think the data keeps everything into one file. so How do I do to make it save them into two different files?
The quickest option I can think of is to try using data[[dir]] to make each directory's data its own object in the data list. Then you can access them with data$`path1` etc.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data[[dir]]<-rbind(data[[dir]],indata)}
}
(However, it might be much nicer (and faster) to use lapply instead of for loops)
You could assign your read in files into new R objects named by your folder number. I changed list() to c() for dirs for easier assignment with assign(). And moved the data <- list() into the first loop so it gets overwritten after each folder is completed.
dirs<-c("path/folder1","path/folder2")
for(dir in 1:length(dirs)){
##read in the list of files in each folder
flist<-list.files(path=dirs[dir], pattern = "\\.csv$")
data <- list()
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on", file)
indata<-read.csv(paste0(dirs[dir],"/",file))
data<-rbind(data,indata)
assign(paste0("data_",dir), data)
}
}

List files starting with given name in subdirectories R

I have a directory containing 70 other sub-directories with different CSV files. CSV files in each directory look like this Modified2-3.csv, added2_3.csv, Retired4_5.csv. My end result is to join all CSV starting with the name Modified but before that How can I loop through all subdirectories selecting only files starting with modified
I have tried this method but it says the character is zero
list.files(pattern = "^Modified.*name.csv")
I do want my result is a list of modified CSV looking like this Modified2_3.csv, Modified3_4.csv,Modified7_8.csv
You should be able to go through them without a loop with the use of list.files()'s recursive argument.
list.files(pattern = "^Modified", recursive=TRUE)

Looping through folder and finding specific file in R

I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.

How to loop over each file in multiple directories in R

I have an object called wanted.bam with the list of wanted file names for all the .bam (is the extension) files in three of my directories path1,path2,path3. I am looping over all these directories to search for the wanted files. What I am trying to do is look for wanted files by looping over each directory and implement a FUNCTION in each file. This loop works for all the matched file in the first directory, but as it progresses to another directory, it breaks giving an error:
Error in value[[3L]](cond) :
failed to open BamFile: file(s) do not exist:
'sort.bam'
my code:
bam.dir<- c("path1","path2","path3")
for (j in 1:length(bam.dir)){
all.bam.files <- list.files(bam.dir[j])
all.bam.files <- grep(wanted.names, all.bam.files, value=TRUE)
print(paste("The wanted number of bam files in this directory:", (length(all.bam.files))))
if(length(all.bam.files)==0){
next
}else{
setwd(bam.dir[j])
}
print(paste("The working directory number:",j,":",(getwd())))
## ****using another loop here for each file to implement a function*****
all.FAD<- {}
for(i in 1:length(all.bam.files)){
output<- FUNCTION(all.bam.files[i])
}
}
You probably don't want to be changing working directory like this. Instead, use the option in list.files, full.names=TRUE, to return the full path of your files. Then, you can just use read.csv, or whatever, on the full path name without need to change directory. Your code is failing because after you set directory, the relative path to the next directory is changed.
If you want to keep changing directories, just make sure you set the directory back to the base directory at the end of the loop.

How can I read multiple files from multiple directories into R for processing?

I am running a simulation study and need to process and save the results from several text files. I have the data organized in such a way where there are sub directories and within each sub directory, I need to process and get individual results for 1000 data files. This is very easy to do in SAS using macros. However, I am new to R and cannot figure out how to do such. Below is what I am trying to accomplish.
DATA Folder-> DC1 -> DC1R1.txt ... DC1R1000.txt
DC2 -> DC2R1.txt ... DC2R1000.txt
Any help would be greatly appreciated!
I'm not near a computer with R right now, but read the help for file-related functions:
The dir function will list the files and directories. It has a recursive argument.
list.files is an alias for dir. The file.info function will tell you (among other things) if a path is a directory and file.path will combine path parts.
The basename and dirname functions might also be useful.
Note that all these functions are vectorized.
EDIT Now at a computer, so here's an example:
# Make a function to process each file
processFile <- function(f) {
df <- read.csv(f)
# ...and do stuff...
file.info(f)$size # dummy result
}
# Find all .csv files
files <- dir("/foo/bar/", recursive=TRUE, full.names=TRUE, pattern="\\.csv$")
# Apply the function to all files.
result <- sapply(files, processFile)
If you need to run the same analysis on each of the files, then you can access them in one shot using list.files(recursive = T). This is assuming that you have already set your working directory to Data Folder. The recursive = T lists all files within subdirectories as well.
filenames <- list.files("path/to/files", recursive=TRUE)
This will give you all the files residing under one folder and sub folders under it.
You can use Perl's glob () function to get a list of files and send it to R using, e.g., RSPerl's interface.

Resources