Iterate the txt files in several subfolders under the folder - r

I am new to R. I have several txt. files in several sub-folders under one folder. The structuer of the txt files are the same.
I would like to iterate the files for each sub-folder and generate a signle file for the whole folder.
I coded as follow:
parent.folder <-"C:/.../18_0101" # Folder containing sub-folders
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #
Sub-folders r.scripts <- file.path(sub.folders) HR_2018 <- list() for
(j in seq_along(r.scripts)) { HR_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}
When I checked HR_2018[[1]], I found only the list of .txt files under the sub-folder. From there, I would like to analyze the files under each sub-folder. And then I would like to iterate the same process for other sub-folders under the folder and generate a single file.
Anyone help me?

Related

How to transfer a large number of Rda files to csv files

I have a bunch of folders from 20180101 to 20180331. Each folder has appropriately 500 Rda files. I wonder how can I transfer all these files into csv?
folders <- list.files(".")
for(folder in folders){
files <- list.files(folder)
for(file in files){
path_n_name <- paste0(folder, "/", file)
raw_read <- readRDS(path_n_name)
out_path_n_name <- gsub(".rds", ".csv", path_n_name, ignore.case = T)
write.csv(raw_read, out_path_n_name)
}
}
For example, in your current working directory, there are a bunch of folders. The first line folders <- list.files(".") will get all the name of the folders, and then you iterate this vector. In each iteration, you get the name of all files from each folder, then convert them to .csv. Note that it only makes sense to save some data types to .csv (i.e. dataframe).

Run R script in different folder than data in Ubuntu Server

I have an R script in one directory that takes in the files in a different directory, combines them into one file, and outputs a new excel file, as shown below:
first <- read_excel("file1.xlsx")
second <- read_excel("file2.xlsx")
third <- read_excel("file3.xlsx")
df <- bind_rows(first,second,third)
openxlsx::write.xlsx(df, "newfile.xlsx")
In my code, I can set the working directory to a particular folder just putting setwd("path/to/data") but this only works in one directory. I'd like to make a shell script where I can loop through various folders.
I'd like to try something like
for i in folder1,folder2,folder3
do
# Run Rscript myRscript.R in each folder
done
Ex. Folder 1 has file1, file2, and file3. Folder 2 has file1, file2, and file3. Folder 3 has file1, file2, file3. I'd like the Rscript to be one directory up from the folders and run in each folder and generate a "newfile.xlsx" file for each folder (Each folder is a different set of data but have all the same file names within each folder)
I want to avoid copying a version of the Rscript into each folder to avoid the folder changing nature of my request. Is this possible?
You can loop through the folders and files with R no problem.
folders <- list.dirs()
for (folder in folders) {
files <- list.files(folder)
# extra: neglect non xlsx files
# files <- files[which(str_detect(files, ".xlsx$"))]
df <- tibble()
for (file in files) {
temp <- read_excel(file)
df <- bind_rows(df, temp)
}
# creates a newfile.xlsx in each folder
openxlsx::write.xlsx(df, file.path(folder, "newfile.xlsx"))
# alternative: creates the newfile in the main folder
# openxlsx::write.xlsx(df, paste0(folder, "_newfile.xlsx"))
}

How do I read thru multiple files in different folders and store them seperately base on the folder from which they've been retrieved?

The main idea is that I have two folders/paths now in my local machine. In each folder, I have multiple csv files files I want to read into my R. However, instead of appending them all together into one files I want all folder1 files being in file1 and all folder2 files being in file2. I only know how to append them all together, but not know how to append them into two separate files. Below are my code so far.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data<-rbind(data,indata)}
}
So far, I think the data keeps everything into one file. so How do I do to make it save them into two different files?
The quickest option I can think of is to try using data[[dir]] to make each directory's data its own object in the data list. Then you can access them with data$`path1` etc.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data[[dir]]<-rbind(data[[dir]],indata)}
}
(However, it might be much nicer (and faster) to use lapply instead of for loops)
You could assign your read in files into new R objects named by your folder number. I changed list() to c() for dirs for easier assignment with assign(). And moved the data <- list() into the first loop so it gets overwritten after each folder is completed.
dirs<-c("path/folder1","path/folder2")
for(dir in 1:length(dirs)){
##read in the list of files in each folder
flist<-list.files(path=dirs[dir], pattern = "\\.csv$")
data <- list()
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on", file)
indata<-read.csv(paste0(dirs[dir],"/",file))
data<-rbind(data,indata)
assign(paste0("data_",dir), data)
}
}

Zip files without directory name in R

Inside working directory I have folders names ending "*_txt" containing files inside folder, want to zip all folders with original name and files inside them. Everything is working perfectly but problem in .zip contains the name of directory as well that i don't want e.g "1202_txt.zip\1202_txt\files" needs to be "1202_txt.zip\files"
dir.create("1202_txt") # creating folder inside working directory
array <- list.files( , "*_txt")
for (i in 1:length(array)){
name <- paste0(array[i],".zip")
#zip(name, files = paste0(d,paste0("/",array[i])))
zip(name, files = array[i])
}
Above code is available Creating zip file from folders in R
Note: Empty folders can be skipped
Can you please try this? (using R 3.5.0, macOS High Sierra 10.13.6)
dir_array <- list.files(getwd(), "*_txt")
zip_files <- function(dir_name){
zip_name <- paste0(dir_name, ".zip")
zip(zipfile = zip_name, files = dir_name)
}
Map(zip_files, dir_array)
This should zip all the folders inside the current working directory with the specified name. The zipped folders are also housed in the current working directory.
Here is the approach I used to achieve my desired results tricky but still works
setwd("c:/test")
dir.create("1202_txt") # creating folder inside working directory and some CSV files in there
array <- list.files( , "*_txt")
for (i in 1:length(array)){
name <- paste0(array[i],".zip")
Zip_Files <- list.files(path = paste0(getwd(),"/", array[[i]]), pattern = ".csv$")
# Moving Working Directory
setwd(file.path("C:\\test\\",array[[i]]))
#zipping files inside the directory
zip::zip(zipfile = paste0(name[[i]]), files = Zip_Files)
# Moving zip File from Inside folder to outside
file.rename(name[i], paste0("C:\\test\\", name[i]))
print(name[i])
}

How do I create a loop in a file path?

UPDATE
Thanks for the suggestions. This is how for I got so far, but I still don't find how I can get the loop to work within the file path name.
setwd("//tsclient/C/Users/xxx")
folders <- list.files("TEST")
--> This gives me a list of my folder names
for(f in folders){
setwd("//tsclient/C/xxx/[f]")
files[f] <- list.files("//tsclient/C/Users/xxx/TEST/[f]", pattern="*.TXT")
mergedfile[f] <- do.call(rbind, lapply(files[f], read.table))
write.table(mergedfile[f], "//tsclient/C/Users/xxx/[f].txt", sep="\t")
}
I have around 100 folders, each containing multiple txt files. I want to create 1 merged file per folder and save that elsewhere. However, I do not want to manually adapt the folder name in my code for each folder.
I created the following code to load in all files from a single folder (which works) and merge these files.
setwd("//tsclient/C/xxx")
files <- list.files("//tsclient/C/Users/foldername", pattern="*.TXT")
file.list <- lapply(files, read.table)
setattr(file.list, "names", files)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]
write.table(masterfilesales, "//tsclient/C/Users/xxx/datasets/foldername.txt", sep="\t")
If I wanted to do this manually, I would every time have to adapt "foldername". The foldernames contain numeric values, containing 100 numbers between 2500 and 5000 (always 4 digits).
I looked into repeat loops, but those don't run using it within a file path.
If anyone could direct me in a good direction, I would be very grateful.

Resources