Read all files in specific folder in R - r

I am trying to read all files in a specific sub-folder of the wd. I have been able to add a for loop successfully, but the loop only looks at files within the wd. I thought the command line:
directory <- 'folder.I.want.to.look.in'
would enable this but the script still only looks in the wd. However, the above command does help create a list of the correct files. I have included the script below that I have written but not sure what I need to modify to aim it at a specific sub-folder.
directory <- 'folder.I.want.to.look.in'
files <- list.files(path = directory)
out_file <- read_excel("file.to.be.used.in.output", col_names = TRUE)
for (filename in files){
show(filename)
filepath <- paste0(filename)
## Import data
data <- read_excel(filepath, skip = 8, col_names = TRUE)
data <- data[, -c(6:8)]
further script
}
The further script is irrelevant to this question and works fine. I just can't get the loop to look over each file in files from directory. Many thanks in advance

Set your base directory, and then use it to create a vector of all the files with list.files, e.g.:
base_dir <- 'path/to/my/working/directory'
all_files <- paste0(base_dir, list.files(base_dir, recursive = TRUE))
Then just loop over all_files. By default, list.files has recursive = FALSE, i.e., it will only get the files and directory names of the directory you specify, rather than going into each subfolder. Setting recursive = TRUE will return the full filepath excluding your base directory, which is why we concatenate it with base_dir.

Related

R loop through subfolders

i have a parent folder containing many sub-folders. I need to read a file in each sub-folder. the file name is Aligned.bam which is identical across all sub-folders. is there a way I can generate all the file paths for this file. so that I can write a loop function to read each bam file.
parent.folder <- '/N/slate/ATACseq'
subfolders <- list.dirs(parent.folder, recursive=TRUE)[-1]
subfolders/Aligned.bam--------->need something like this.
how to achieve this in R?
You could try doing:
Files <- list.files(path = "full_path_to_parent_folder", pattern = "Aligned.bam$", recursive = TRUE, full.names = TRUE)
This would output the paths to all Aligned.bam in your desired parent folder set in path.

read a single *.xlsx file in R without the use of filename but utilizing the *.xlsx

I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.
The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))

Copy files from nested folders to new nested folders

I am trying to copy a large number of files from one folder to another. We need to restructure the folders, so there is a translation from the old folder path to a new one. The old folder structure is also nested.
Currently the code I have is not throwing any errors, but is returning false on executing the file.copy for all files.
ETA: When I copy a single file, it works.
allFilePaths <- list.files('./oldTopLevelFolder', recursive = TRUE)
testIds <- c(1:4)
otherTestIds <- c(5:8)
allNewFolders <- paste('newTopLevelFolder', testIds, 'aFolderName', otherTestIds, sep = '/')
lapply(allNewFolders, dir.create, recursive = TRUE)
file.copy(from=allFilePaths, to=allNewFolders,
copy.mode = TRUE)
file.copy can copy multiple files, but only to a single destination folder by the looks of it.
In order to copy a bunch of files into varying destination folders, the following will do the job, where allOldFilePaths is a column containing the old filepath where each file currently exists, and allNewFilePaths is a column containing the new folder path for each file.
# function to copy a single file
copySingleFile <- function(oldPath, newPath) {
file.copy(from=oldPath, to=newPath,
copy.mode = TRUE)
}
# copy each file to its new folder path
mapply(copySingleFile, allFilePathsWithRoot, allNewFilePaths)

Importing files with almost similar path and name

I have many txt files that I want to import into R. These files are imported one by one, I do the operations that I want, and then I import the next file.
All these files are located in a database system where all the folders have almost the same names, e.g.
database\type4\system50
database\type6\system50
database\type4\system30
database\type4\system50
Similarly, the names of the files are also almost the same, referring to the folder where they are positioned, e.g..
type4.system50.txt
type6.system50.txt
type4.system30.txt
type4.system50.txt
I have heard that there should be a easier way of importing these many files one by one, than simply multiple setwd and read.csv2 commands. As far as I understand this is possible by the macro import function in SAS, where you specify an overall path and then for each time you want to import a file you specify what is specific about this file name/folder name.
Is there a similar function in R? I tried to look at
Importing Data in R like SAS macro
, but this question did not really show me how to specify the folder name/file name.
Thank you for your help.
If you want to specify folder name / file name, try this
databasepath="path/to/database"
## list all files
list.files(getwd(),recursive = T,full.names = T,include.dirs = T) -> tmp
## filter files you want to read
readmyfile <- function(foldername,filename){
tmp[which(grepl(foldername,tmp) & grepl(filename,tmp))]
}
files_to_read <- readmyfile("type4", "system50")
some_files <- lapply(files_to_read, read.csv2)
## Or you can read all of them (if memory is large enough to hold them)
all_files <- lapply(tmp,read.csv2)
Instead of using setwd continuously, you could specify the absolute path for each file, save all of the paths to a vector, loop through the vector of paths and load the files into a list
library(data.table)
file_dir <- "path/to/files/"
file_vec <- list.files(path = file_dir, pattern = "*.txt")
file_list <- list()
for (n in 1:length(file_list)){
file_list[[n]] <- fread(input = paste0(file_dir, file_vec[n]))
}

Copy multiple files from multiple folders to a single folder using R

Hey I want to ask how to copy multiple files from multiple folders to a single folders using R language
Assuming there are three folders:
desktop/folder_A/task/sub_task/
desktop/folder_B/task/sub_task/
desktop/folder_C/task/sub_task/
In each of the sub_task folder, there are multiple files. I want to copy all the files in the sub_task folders and paste them in a new folder (let's name this new folder as "all_sub_task") on desktop. Can anyone show me how to do it in R using the loop or apply function? Thanks in advance.
Here is an R solution.
# Manually enter the directories for the sub tasks
my_dirs <- c("desktop/folder_A/task/sub_task/",
"desktop/folder_B/task/sub_task/",
"desktop/folder_C/task/sub_task/")
# Alternatively, if you want to programmatically find each of the sub_task dirs
my_dirs <- list.files("desktop", pattern = "sub_task", recursive = TRUE, include.dirs = TRUE)
# Grab all files from the directories using list.files in sapply
files <- sapply(my_dirs, list.files, full.names = TRUE)
# Your output directory to copy files to
new_dir <- "all_sub_task"
# Make sure the directory exists
dir.create(new_dir, recursive = TRUE)
# Copy the files
for(file in files) {
# See ?file.copy for more options
file.copy(file, new_dir)
}
Edited to programmatically list sub_task directories.
This code should work. This function takes one directory -for example desktop/folder_A/task/sub_task/- and copies everything there to a second one. Of course you can use a loop or apply to use more than one directory at once, as the second value is fixed sapply(froms, copyEverything, to)
copyEverything <- function(from, to){
# We search all the files and directories
files <- list.files(from, r = T)
dirs <- list.dirs(from, r = T, f = F)
# We create the required directories
dir.create(to)
sapply(paste(to, dirs, sep = '/'), dir.create)
# And then we copy the files
file.copy(paste(from, files, sep = '/'), paste(to, files, sep = '/'))
}

Resources