Run R script in different folder than data in Ubuntu Server - r

I have an R script in one directory that takes in the files in a different directory, combines them into one file, and outputs a new excel file, as shown below:
first <- read_excel("file1.xlsx")
second <- read_excel("file2.xlsx")
third <- read_excel("file3.xlsx")
df <- bind_rows(first,second,third)
openxlsx::write.xlsx(df, "newfile.xlsx")
In my code, I can set the working directory to a particular folder just putting setwd("path/to/data") but this only works in one directory. I'd like to make a shell script where I can loop through various folders.
I'd like to try something like
for i in folder1,folder2,folder3
do
# Run Rscript myRscript.R in each folder
done
Ex. Folder 1 has file1, file2, and file3. Folder 2 has file1, file2, and file3. Folder 3 has file1, file2, file3. I'd like the Rscript to be one directory up from the folders and run in each folder and generate a "newfile.xlsx" file for each folder (Each folder is a different set of data but have all the same file names within each folder)
I want to avoid copying a version of the Rscript into each folder to avoid the folder changing nature of my request. Is this possible?

You can loop through the folders and files with R no problem.
folders <- list.dirs()
for (folder in folders) {
files <- list.files(folder)
# extra: neglect non xlsx files
# files <- files[which(str_detect(files, ".xlsx$"))]
df <- tibble()
for (file in files) {
temp <- read_excel(file)
df <- bind_rows(df, temp)
}
# creates a newfile.xlsx in each folder
openxlsx::write.xlsx(df, file.path(folder, "newfile.xlsx"))
# alternative: creates the newfile in the main folder
# openxlsx::write.xlsx(df, paste0(folder, "_newfile.xlsx"))
}

Related

Iterate the txt files in several subfolders under the folder

I am new to R. I have several txt. files in several sub-folders under one folder. The structuer of the txt files are the same.
I would like to iterate the files for each sub-folder and generate a signle file for the whole folder.
I coded as follow:
parent.folder <-"C:/.../18_0101" # Folder containing sub-folders
sub.folders <- list.dirs(parent.folder, recursive=TRUE)[-1] #
Sub-folders r.scripts <- file.path(sub.folders) HR_2018 <- list() for
(j in seq_along(r.scripts)) { HR_2018[[j]] <- dir(r.scripts[j],"\\.txt$")}
When I checked HR_2018[[1]], I found only the list of .txt files under the sub-folder. From there, I would like to analyze the files under each sub-folder. And then I would like to iterate the same process for other sub-folders under the folder and generate a single file.
Anyone help me?

Unzipping Files in R Using a Loop or Lapply and Saving to the Same Folder

I have a data frame that contains all of my directory and sub-directory file paths. My first dataset contains the input 'path':
Input:
\\data\A\New Jersey\Construction\2020.10.27\Results.zip
\\data\A\New Jersey\Materials\2020.10.27\Results.zip
\\data\A\Pennsylvania\Construction\2020.10.27\Results.zip
\\data\A\Pennsylvania\Electrician\2020.10.27\Results.zip
My second dataset contains the 'output' path:
Output:
\\data\A\New Jersey\Construction\2020.10.27
\\data\A\New Jersey\Materials\2020.10.27
\\data\A\Pennsylvania\Construction\2020.10.27
\\data\A\Pennsylvania\Electrician\2020.10.27
As you can see, I want the files unzipped to the same folder. Currently I can use
lapply(list, unzip)
to unzip to my working directory; however, I would like to unzip the file to the same location of the zip file. To do this I believe that I need to use a for loop and define the exdir using the 'output' dataset.
for(i in length(input)){
for(f in length(output)){
path <- input[i]
out <- output[f]
unzip(path, out)
}}
Any suggestions?

loop in all files in a directory with r script

I have a directory which contains different subfolders and other files. I need to access each subfolder, read the .tsv file and carry out the following rscript. How to loop this rscript and run it from the terminal?
for(i in my_files){
s <- read.csv('abundance.tsv',sep = '\t')
colnames(compare)[1] <- 'target_id'
colnames(s)[1] <- 'target_id'
s1 <- merge(compare, s, by = "target_id")
output.filename <- gsub("(.*?)", "\\1.csv", i)
write.table(s1, output.filename)
}
list.dirs() returns a list of the directories in the given path and list.files() a list of files in a given path, see here for the documentation.
list.dirs() can be recursive or not, so you can get only directory at the first level and then call list.dirs() again on each sub-directories (inside a loop) or directly get all the sub-directories.
With these two functions you can build your my_files array (since I do not know exactly your directory structure, I can't give an example).
If you have multiples files and want to open only some of them, you can check if the file name contains some sub-string you want (e.g. the file extension). The way to do it is shown here.

Zip files without directory name in R

Inside working directory I have folders names ending "*_txt" containing files inside folder, want to zip all folders with original name and files inside them. Everything is working perfectly but problem in .zip contains the name of directory as well that i don't want e.g "1202_txt.zip\1202_txt\files" needs to be "1202_txt.zip\files"
dir.create("1202_txt") # creating folder inside working directory
array <- list.files( , "*_txt")
for (i in 1:length(array)){
name <- paste0(array[i],".zip")
#zip(name, files = paste0(d,paste0("/",array[i])))
zip(name, files = array[i])
}
Above code is available Creating zip file from folders in R
Note: Empty folders can be skipped
Can you please try this? (using R 3.5.0, macOS High Sierra 10.13.6)
dir_array <- list.files(getwd(), "*_txt")
zip_files <- function(dir_name){
zip_name <- paste0(dir_name, ".zip")
zip(zipfile = zip_name, files = dir_name)
}
Map(zip_files, dir_array)
This should zip all the folders inside the current working directory with the specified name. The zipped folders are also housed in the current working directory.
Here is the approach I used to achieve my desired results tricky but still works
setwd("c:/test")
dir.create("1202_txt") # creating folder inside working directory and some CSV files in there
array <- list.files( , "*_txt")
for (i in 1:length(array)){
name <- paste0(array[i],".zip")
Zip_Files <- list.files(path = paste0(getwd(),"/", array[[i]]), pattern = ".csv$")
# Moving Working Directory
setwd(file.path("C:\\test\\",array[[i]]))
#zipping files inside the directory
zip::zip(zipfile = paste0(name[[i]]), files = Zip_Files)
# Moving zip File from Inside folder to outside
file.rename(name[i], paste0("C:\\test\\", name[i]))
print(name[i])
}

Unzip and rename files keeping original file extension

I want to unzip the files inside a folder and rename them with the same name as their .zip file of origin BUT keeping the original extension of the individual files. Any ideas on how to do this?
Reproducible example:
# Download zip files
ftppath1 <- "ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/se/se_setores_censitarios.zip"
ftppath2 <- "ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/al/al_setores_censitarios.zip"
download.file(ftppath1, "SE.zip", mode="wb")
download.file(ftppath2, "AL.zip", mode="wb")
What I had in mind was something as naive as this:
# unzip and rename files
unzip("SE.zip", file_name= paste0("SE",.originalextension))
unzip("AL.zip", file_name= paste0("AL",.originalextension))
In the end, these are the files I would have in my folder:
SE.zip
AL.zip
AL.shx
AL.shp
AL.prj
AL.dbf
SE.shx
SE.shp
SE.prj
SE.dbf
for (stem in c('SE','AL')) {
zf <- paste0(stem,'.zip'); ## derive zip file name
unzip(zf); ## extract all compressed files
files <- unzip(zf,list=T)$Name; ## get their orig names
for (file in files) file.rename(file,paste0(stem,'.',sub('.*\\.','',file))); ## rename
};
system('ls;');
## AL.dbf AL.prj AL.shp AL.shx AL.zip SE.dbf SE.prj SE.shp SE.shx SE.zip

Resources