Using lapply to source multiple R scripts in sub-directories - r

These are the folders in my directory
128 128-1-32 16384 16384-1-36 4096-1 512 512-1-65 65536-1
128-1 128tbw1 16384-1 4096 4096-1-36 512-1 65536
Each of them has a7.R code that loads files from each folder and creates images.I want my script to enter each of the folders then
source('a7.R')
then exit that folder and repeat the process for all the folders.I am doing this now manually and it is really boring.Is this possible with R?
I have tried solution like this
#!/usr/bin/Rscript
lapply(list.files(full.names=TRUE, recursive = TRUE, pattern = "^a7\\.R$"), source)
milenko#milenko-desktop:~/jbirp/mt07$ Rscript s.R
list()
The coffeinejunky's solution is not working
#!/usr/bin/Rscript
foo <- function(directory) { setwd(directory); source(a7.R) }
do.call("foo", list(directory= 128 128-1-32 16384 16384-1-36 4096-1 512 512-1-65 65536-1 128-1 128tbw1 16384-1 4096 4096-1-36 512-1 65536))
source('n.R')
Error in source("n.R") : n.R:2:33: unexpected numeric constant
1: foo <- function(directory) { setwd(directory); source(a7.R) }
2: do.call("foo", c(directory= 128 128
If i change list like this
do.call("foo", list(directory= "./128" "./128-1" "./128-1-32" "./128tbw1" "./16384" "./16384-1" "./16384-1-36" "./4096" "./4096-1" "./4096-1-36" "./512" "./512-1" "./512-1-65" "./65536" "./65536-1"))
I got
Error in source("n.R") : n.R:2:40: unexpected string constant
1: foo <- function(directory) { setwd(directory); source(a7.R) }
2: do.call("foo", list(directory= "./128" "./128-1"
^
This is what I got when I list path
> list.dirs(path = ".", full.names = TRUE)
[1] "." "./128" "./128-1" "./128-1-32" "./128tbw1"
[6] "./16384" "./16384-1" "./16384-1-36" "./4096" "./4096-1"
[11] "./4096-1-36" "./512" "./512-1" "./512-1-65" "./65536"
[16] "./65536-1"
I need to change directory multiple times and perform the same operation in each of them.Is lapply good for this or not?

The following should work:
directories <- list.dirs(path=".", full.names = T)
# you need to make sure this contains the relevant directories
# otherwise you need to remove irrelevant directories
foo <- function(x) {
old <- setwd(x) # this stores the old directory and changes into the new one
source("a7.R")
setwd(old)
}
lapply(directories, foo)
Alternatively,
for(folder in directories) foo(folder)

This will source every a7.R file with the working directory temporarily set to the sourced file's folder.
a7files <- list.files(full.names=TRUE, recursive = TRUE, pattern = "^a7\\.R$")
sapply(a7files, source, chdir = TRUE)
From ?source
chdir logical; if TRUE and file is a pathname, the R working directory is temporarily changed to the directory containing file for evaluating.

Related

How to call R script from command line with multiple augument types (inc. list)

I have been working on this for a while and I am still stuck.
I would like to call an Rscript using multiple arguments from what is essentially a command line (Snakemake file). The main difference between what I am asking and what I see on SO (How to pass list of arguments to method in R?, How can I pass an array as argument to an R script command line run?, Is it possible to pass an entire list as a command line argument in R) is that my arguments are a combination of strings, numbers, and a list.
Here is the set up in my rules (Snakemake file):
rule cluster_plots_DGE:
input:
script = 'src/scripts/create_images_DGE.R',
analyze_sc_object_output = sc_objects
params:
project = PROJECT,
method = METHOD,
rpath = RPATH,
storage=STORAGE,
components = COMPONENTS,
reso_file = resolution_file,
sample_files = integrated_seurat_objects
output: dge_files
log:
log_output = log_directory + PROJECT.lower() + '_DGE.log'
shell:
"Rscript {input.script} {params.project} {params.method} {params.rpath} {params.storage} {params.components} {params.reso_file} {params.sample_files} 2> {log.log_output}"
Here is what the call translates to:
Rscript src/scripts/create_images_DGE.R project_name ALL path_to_R_installed_libraries rds 50 data/endpoints/project_name/analysis/PCA_14/tables/project_nameR_resolution_list.txt data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS data/endpoints/project_name/analysis/PCA_14/RDS/project_name_RPCA_0.5.RDS data/endpoints/project_name/analysis/PCA_14/RDS/project_name_SCT_0.5.RDS 2> logs/DGE_Markers/project_name_DGE.log
Where sample_files = integrated_seurat_objects is a list containing:
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS,
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_RPCA_0.5.RDS,
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_SCT_0.5.RDS
And here is the beginning of my R script:
args = commandArgs(trailingOnly=TRUE)
compo <- ''
project <- ''
method <- ''
lib_path <- ''
storage <- ''
res_file <- ''
integrated_object <- '' #list of objects
# test if there is at least 7 arguments: if not, return an error
if (length(args) < 7) {
stop('At least seven arguments must be supplied.', call.=FALSE)
} else if (length(args)==7) {
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7]
#integrated_object = eval(parse(text=args[7]))
}
print(compo)
print(project)
print(method)
print(lib_path)
print(storage)
print(res_file)
print(integrated_object)
If I use the entire integrated_seurat_objects list, this is what gets returned:
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
If I take the first entry from integrated_seurat_objects and pass that as an argument, I get (I have replaced the actual project name and paths is this post):
[1] "50"
[1] project_name
[1] "ALL"
[1] library_path_to_R_libraries
[1] "rds"
[1] "data/endpoints/project_name/analysis/PCA_14/tables/project_name_resolution_list.txt"
[1] "data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS"
It seems do-able but I have not cracked it yet. How can I pass multiple arguments that include a list to an R script form the command line? Any assistance is always appreciated.
#MrFlick deserves credit for this answer. The issue was I was not accounting for a situation where the number of arguments would be greater than 7 (duh).
A quick very fix:
if (length(args) < 7)
{
stop('At least seven arguments must be supplied.', call.=FALSE)
}
if (length(args)==7)
{
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7]
}
if (length(args)>7)
{
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7:length(args)]
}
Thank you for your eyes #MrFlick

Using functions in R to copy the same file from multible folders and renaming the file based on folder structure

I am new to R, and have built a working series of loops that copies and renames a single file from multible subfolders.
Great it works!! ... but what is this question about then?
Well, I have learned enough of R to know that most users agree that one should avoid loops and use functions instead whenever possible. This is because functions are a lot faster than loops.
When I apply my loop to my real data, it will have to cycle through +10000 files, and I would therefore like to make it as fast as possible.
Can my loop be expressed by using functions instead?
Alternatively can the loop be optimized in some way?
Any examples or suggestions on how to achieve any of the above will be much appreciated.
Info on my file-structure:
F:/Study
1000
1001
Gait1
annotations.txt
Gait2
annotations.txt
1002
Gait1
annotations.txt
Gait2
annotations.txt
59000
59003
Gait1
annotations.txt
Gait2
annotations.txt
nb. the "Gait" folders contain many other files and directories, but I am only interested in the "annotations.txt" files.
My loop:
for( a in seq(1000, 99000, by = 1000) ) {
if (!dir.exists(paste0("F:/Study/", a))) {
next()
}
for ( b in seq(1, 200, by = 1) ) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b))) {
next()
}
for ( c in seq(1, 2, by = 1)) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b, "/", "Gait", c))) {
next()
}
file.copy(from = paste0("F:/Study/", a,"/", a+b, "/", "Gait", c,"/annotations.txt"), to = "F:/Study/Annotations", overwrite = TRUE)
setwd("F:/Study/Annotations")
file.rename( from = "annotations.txt", to = paste0(a+b, "_Gait", c, "_annotations.txt") )
}
}
}
Result is files in my Annotations folder called:
1001_Gait1_annotations
1001_Gait2_annotations
1002_Gait1_annotations
1002_Gait2_annotations
59003_Gait1_annotations
59003_Gait2_annotations
tl;dr Can the loop be expressed using functions? How?
You might try the following (I have assumed that your /Annotations directory already exists). Does that work for you?
#get all file names (full path) with the pattern "annotations" within all folders in directory
files <- list.files("F:/Study/", full.names = TRUE, recursive = TRUE, pattern = "annotations")
#extract the desired parts for the new filename with regex
#d+ stands for 1+x digits, .* for anything and $ for the end of line
#extract only the second capturing group (the parts within parenthesis) via \\2
filenames_new <- gsub("(.*/)(\\d+/Gait\\d+.*$)", "\\2", files)
filenames_new <- gsub("/", "_", filenames_new)
#create the new filepaths
files_new <- paste0("F:/Study/Annotations/", filenames_new)
#perform copy
file.copy(from = files, to = files_new)

List directories up to the level of my files only

This line gives me hierarchical directory paths down to the files:
dirs<- as.data.frame(list.dirs(path = rootdir, full.names = F, recursive = T))
Like so:
"","list.dirs(path = rootdir, full.names = F, recursive = T)"
"1",""
"2","19"
"3","19/H"
"4","19/H/BA"
"5","19/H/BA/2016"
"6","19/H/BA/2016/11"
"7","19/H/BA/2016/11/10"
"8","19/H/BA/2016/11/10/0" # files are in here
"9","19/H/BA/2016/12"
"10","19/H/BA/2016/12/20"
"11","19/H/BA/2016/12/20/0" # files are in here
"12","19/H/BA/2017"
"13","19/H/BA/2017/1"
"14","19/H/BA/2017/1/19"
"15","19/H/BA/2017/1/19/0" # files are in here
"16","19/H/BA/2017/1/29"
"17","19/H/BA/2017/1/29/0" # files are in here
"18","19/H/BA/2017/3"
"19","19/H/BA/2017/3/20"
"20","19/H/BA/2017/3/20/0" # files are in here
But how would I write the code to only give me the paths to the files? I.e.,
"19/H/BA/2016/11/10/0"
"19/H/BA/2016/12/20/0"
"19/H/BA/2017/1/19/0"
"19/H/BA/2017/1/29/0"
"19/H/BA/2017/3/20/0"
You can use dirname instead of a regular expression, this will handle special cases like rootdir == "C:/" or rootdir == "../" :
unique(dirname(list.files(rootdir,rec=T)))
We can use list.files to get the path of all the files which are present (so that it doesn't give us any empty directory paths).
filepath = list.files(rootdir, recursive = T)
Now this will have path to all the files, we can use sub to remove the filenames from it and keep only the directory name.
sub("[/].*", "", filepath)
This removes everything from /. Finally to avoid duplication we can take unique of it.
To do everything in one liner.
unique(sub("[/].*", "", list.files(rootdir, recursive = T)))

How to save all images in a separate folder?

So, I am running the following code:
dirtyFolder = "Myfolder/test"
filenames = list.files(dirtyFolder, pattern="*.png")
for (f in filenames)
{
print(f)
imgX = readPNG(file.path(dirtyFolder, f))
x = data.table(img2vec(imgX), kmeansThreshold(imgX))
setnames(x, c("raw", "thresholded"))
yHat = predict(gbm.mod, newdata=x, n.trees = best.iter)
img = matrix(yHat, nrow(imgX), ncol(imgX))
img.dt=data.table(melt(img))
names.dt<-names(img.dt)
setnames(img.dt,names.dt[1],"X1")
setnames(img.dt,names.dt[2],"X2")
Numfile = gsub(".png", "", f, fixed=TRUE)
img.dt[,id:=paste(Numfile,X1,X2,sep="_")]
write.table(img.dt[,c("id","value"),with=FALSE], file = "submission.csv", sep = ",", col.names = (f == filenames[1]),row.names = FALSE,quote = FALSE,append=(f != filenames[1]))
# show a sample
if (f == "4.png")
{
writePNG(imgX, "train_101.png")
writePNG(img, "train_cleaned_101.png")
}
}
What it does is basically, takes as input images which have noise in them and removes noise from them. This is only the later part of the code which applies the algorithm prepared from a training dataset (not shown here).
Now, I am not able to figure out, how can I save the cleaned image for each of the images in the test folder. That is, I wish to save the cleaned image for each of the images in the folder and not just the 4.png image. The output image should have the name as 4_cleaned.png if the input image has the name 4.png and it should be saved in a separate folder in the same directory. That is, if input image has the name x.png, the output image should have the name x_cleaned.png and saved in a separate folder. How can I do it?
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Alright, so construct the output filename using file.path and a function such as paste or sprintf:
folder_name = 'test'
output_filename_pattern = file.path(folder_name, '%s_cleaned.png')
remove_extension = function (filename)
gsub('\\.[^.]$', '', filename)
for (f in filenames) {
# … your code her …
new_filename = sprintf(output_filename_pattern, remove_extension(f))
# … save file here …
}

use R to loop through subdirectories and copy files

I am trying to create a batch script in R to pre-process some data and one of the first steps I have to do is check to see if a file exists in a sub-directory and then (if it does) create a copy of it with a new name. I'm having trouble with the syntax.
This is my code:
##Define the subject directory path
sDIR = "/home/bsussman/Desktop/WORKSPACE"
#create data frame to loop through
##list of subject directories
subjects <-list.dirs(path = sDIR, full.names = TRUE, recursive = FALSE)
for (subj in 1:length(subjects)){
oldT1[[subj]] <- dir(subjects[subj], pattern=glob2rx("s*.nii"), full.names=TRUE)
T1[[subj]] <- paste(subjects[subj], pattern="/T1.nii",sep="")
if (file.exists(paste(subjects[subj], pattern="/T1.nii",sep=""))=FALSE{
file.copy(oldT1, T1)
}
}
It renames files in one subdirectory, but will not do loop through gives me these errors:
Error: unexpected '=' in:
"
if (file.exists(paste(subjects[subj], pattern="/T1.nii",sep=""))="
> file.copy(oldT1, T1)
[1] FALSE
> }
Error: unexpected '}' in " }"
> }
Error: unexpected '}' in "}"
I am not as much worried about the [1]FALSE message. But any ideas?
Thanks!!
It's just a problem with the syntax in the if statement. Try replacing this:
if (file.exists(paste(subjects[subj], pattern="/T1.nii",sep=""))=FALSE{
file.copy(oldT1, T1)
}
with this:
if (!file.exists(paste(subjects[subj], pattern="/T1.nii",sep=""))){
file.copy(oldT1, T1)
}

Resources