How do I rename files using R? - r

I have over 700 files in one folder named as:
files from number 1 to number9 are named for the first month:
water_200101_01.img
water_200101_09.img
files from number 10 to number30 are named:
water_200101_10.img
water_200101_30.img
And so on for the second month:
files from number 1 to number9 are named:
water_200102_01.img
water_200102_09.img
files from number 10 to number30 are named:
water_200102_10.img
water_200102_30.img
How can I rename them without making any changes to the files. just change the nams, for example
water_1
water_2
...till...
water_700

file.rename will rename files, and it can take a vector of both from and to names.
So something like:
file.rename(list.files(pattern="water_*.img"), paste0("water_", 1:700))
might work.
If care about the order specifically, you could either sort the list of files that currently exist, or if they follow a particular pattern, just create the vector of filenames directly (although I note that 700 is not a multiple of 30).
I will set aside the question, "why would you want to?" since you seem to be throwing away information in the filename, but presumably that information is contained elsewhere as well.

I wrote this for myself. It is fast, allows regex in find and replace, can ignore the file suffix, and can show what would happen in a "trial run" as well as protect against over-writing existing files.
If you are are on a mac, it can use applescript to pick out the current folder in the Finder as a target folder.
umx_rename_file <- function(findStr = "Finder", replaceStr = NA, baseFolder = "Finder", test = TRUE, ignoreSuffix = TRUE, listPattern = NULL, overwrite = FALSE) {
umx_check(!is.na(replaceStr), "stop", "Please set a replaceStr to the replacement string you desire.")
# ==============================
# = 1. Set folder to search in =
# ==============================
if(baseFolder == "Finder"){
baseFolder = system(intern = TRUE, "osascript -e 'tell application \"Finder\" to get the POSIX path of (target of front window as alias)'")
message("Using front-most Finder window:", baseFolder)
} else if(baseFolder == "") {
baseFolder = paste(dirname(file.choose(new = FALSE)), "/", sep = "") ## choose a directory
message("Using selected folder:", baseFolder)
}
# =================================================
# = 2. Find files matching listPattern or findStr =
# =================================================
a = list.files(baseFolder, pattern = listPattern)
message("found ", length(a), " possible files")
changed = 0
for (fn in a) {
if(grepl(pattern = findStr, fn, perl= TRUE)){
if(ignoreSuffix){
# pull suffix and baseName (without suffix)
baseName = sub(pattern = "(.*)(\\..*)$", x = fn, replacement = "\\1")
suffix = sub(pattern = "(.*)(\\..*)$", x = fn, replacement = "\\2")
fnew = gsub(findStr, replacement = replaceStr, x = baseName, perl= TRUE) # replace all instances
fnew = paste0(fnew, suffix)
} else {
fnew = gsub(findStr, replacement = replaceStr, x = fn, perl= TRUE) # replace all instances
}
if(test){
message(fn, " would be changed to: ", omxQuotes(fnew))
} else {
if((!overwrite) & file.exists(paste(baseFolder, fnew, sep = ""))){
message("renaming ", fn, "to", fnew, "failed as already exists. To overwrite set T")
} else {
file.rename(paste0(baseFolder, fn), paste0(baseFolder, fnew))
changed = changed + 1;
}
}
}else{
if(test){
# message(paste("bad file",fn))
}
}
}
if(test & changed==0){
message("set test = FALSE to actually change files.")
} else {
umx_msg(changed)
}
}

If you want to replace a certain section of the file name that matches a given pattern with another pattern. This is useful for renaming several files at once. For example, this code would take all of your files containing foo and replace foo with bob in the file names.
file.rename(list.files(pattern = "foo"), str_replace(list.files(pattern = "foo"),pattern = "foo", "bob"))

The following was my workaround for matching in sequence and changing all the filenames in a specified directory using simple base code.
old_files <- list.files(path = ".", pattern="water_*.img$")
# Create df for new files
new_files <- data.frame()
for(i in 1:length(old_files)){
new_files <- append(paste0(path = ".", substr(old_files[i], 1,6),"water_",i,".img"), new_files)
}
new_files <- as.character(new_files)
# Copy from old files to new files
file.rename(from = old_files), to = as.vector(new_files)

Related

How to call R script from command line with multiple augument types (inc. list)

I have been working on this for a while and I am still stuck.
I would like to call an Rscript using multiple arguments from what is essentially a command line (Snakemake file). The main difference between what I am asking and what I see on SO (How to pass list of arguments to method in R?, How can I pass an array as argument to an R script command line run?, Is it possible to pass an entire list as a command line argument in R) is that my arguments are a combination of strings, numbers, and a list.
Here is the set up in my rules (Snakemake file):
rule cluster_plots_DGE:
input:
script = 'src/scripts/create_images_DGE.R',
analyze_sc_object_output = sc_objects
params:
project = PROJECT,
method = METHOD,
rpath = RPATH,
storage=STORAGE,
components = COMPONENTS,
reso_file = resolution_file,
sample_files = integrated_seurat_objects
output: dge_files
log:
log_output = log_directory + PROJECT.lower() + '_DGE.log'
shell:
"Rscript {input.script} {params.project} {params.method} {params.rpath} {params.storage} {params.components} {params.reso_file} {params.sample_files} 2> {log.log_output}"
Here is what the call translates to:
Rscript src/scripts/create_images_DGE.R project_name ALL path_to_R_installed_libraries rds 50 data/endpoints/project_name/analysis/PCA_14/tables/project_nameR_resolution_list.txt data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS data/endpoints/project_name/analysis/PCA_14/RDS/project_name_RPCA_0.5.RDS data/endpoints/project_name/analysis/PCA_14/RDS/project_name_SCT_0.5.RDS 2> logs/DGE_Markers/project_name_DGE.log
Where sample_files = integrated_seurat_objects is a list containing:
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS,
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_RPCA_0.5.RDS,
data/endpoints/project_name/analysis/PCA_14/RDS/project_name_SCT_0.5.RDS
And here is the beginning of my R script:
args = commandArgs(trailingOnly=TRUE)
compo <- ''
project <- ''
method <- ''
lib_path <- ''
storage <- ''
res_file <- ''
integrated_object <- '' #list of objects
# test if there is at least 7 arguments: if not, return an error
if (length(args) < 7) {
stop('At least seven arguments must be supplied.', call.=FALSE)
} else if (length(args)==7) {
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7]
#integrated_object = eval(parse(text=args[7]))
}
print(compo)
print(project)
print(method)
print(lib_path)
print(storage)
print(res_file)
print(integrated_object)
If I use the entire integrated_seurat_objects list, this is what gets returned:
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
[1] ""
If I take the first entry from integrated_seurat_objects and pass that as an argument, I get (I have replaced the actual project name and paths is this post):
[1] "50"
[1] project_name
[1] "ALL"
[1] library_path_to_R_libraries
[1] "rds"
[1] "data/endpoints/project_name/analysis/PCA_14/tables/project_name_resolution_list.txt"
[1] "data/endpoints/project_name/analysis/PCA_14/RDS/project_name_Standard_0.5.RDS"
It seems do-able but I have not cracked it yet. How can I pass multiple arguments that include a list to an R script form the command line? Any assistance is always appreciated.
#MrFlick deserves credit for this answer. The issue was I was not accounting for a situation where the number of arguments would be greater than 7 (duh).
A quick very fix:
if (length(args) < 7)
{
stop('At least seven arguments must be supplied.', call.=FALSE)
}
if (length(args)==7)
{
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7]
}
if (length(args)>7)
{
project = args[1]
method = args[2]
lib_path = args[3]
storage = args[4]
compo = args[5]
res_file = args[6]
integrated_object = args[7:length(args)]
}
Thank you for your eyes #MrFlick

dir + file.copy returns "file does not exist" warnings due to non-english characters, how can I fix this?

I have this function in RStudio to synchronize 2 folders on windows.
p1 and p2 are paths;
fsync<-function(p1,p2){
A<-dir(p1,all.files = T,recursive = T,ignore.case = T, include.dirs = F,full.names = T);
B<-dir(p2,all.files = T,recursive = T,ignore.case = T, include.dirs = F,full.names = T);
d1<-setdiff(A,B);
d2<-setdiff(B,A);
if(length(d1)!=0) file.copy(d1,p2,overwrite = F,recursive = T)
if(length(d2)!=0) file.copy(d2,p1,overwrite = F,recursive = T)
}
When I run it, it worked, but also shows warnings saying "the file does not exist" or "no such file or directory" (I'm not really sure right now). I think it is only with files containning non-english characters (e.g. á, é, ...). How can I make dir() take the file names correctly?

R move whole folder to another directory

I would like to move the whole folder from one directory to another, this is my code,
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
current_files = list.files(folder_old_path, full.names = TRUE)
file.copy(from = current_files, to = path_new,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
However, I am getting this error msg
Error in file.copy(from = current_files, to = path_new, overwrite = recursive, :
more 'from' files than 'to' files
any idea how to fix this? thank you so much for your help!
library(ff)
from <- "~/Path1/" #Current path of your folder
to <- "~/Path2/" #Path you want to move it.
path1 <- paste0(from,"NameOfMyFolder")
path2 <- paste0(to,"NameOfMyFolder")
file.move(path1,path2)
Try using this little code.
Easiest:
file.rename(folder_old_path, path_new)
If you want to check if path_new already exists you can expand the above to:
if (dir.exists(path_new) {
print(paste("already exists so recurively deleting path_new", path_new))
unlink(path_new, recursive = TRUE)
}
It appears as though the current_files = list.files(folder_old_path, full.names = TRUE) step is unnecessary. If my understanding of the R file documentation is correct, then you should be able to just use the following:
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
file.copy(from = folder_old_path, to = path_new,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
If that doesn't work, then you'll have to create a new list of files (iterate over the current_files and replace folder_old_path with folder_new_path for each item in the list) and call file.copy on those:
folder_old_path = "C:/Users/abc/Downloads/managerA"
path_new = "C:/User/abc/Desktop/managerA"
current_files = list.files(folder_old_path, full.names = TRUE)
new_files = # replace folder_old_path with path_new for every file in current_files
file.copy(from = current_files, to = new_files,
overwrite = recursive, recursive = FALSE, copy.mode = TRUE)
... this all assumes (of course) that both folder_old_path and path_new exist and you have the correct permissions on them.
The linked page does contain a caveat/note about windows paths:
There is no guarantee that these functions will handle Windows
relative paths of the form d:path: try d:./path instead. In
particular, d: is not recognized as a directory. Nor are \\?\ prefixes
(and similar) supported.
On linux you should be able to simply:
1) make the OTHER_DIR if needed. If it is a subdirectory to OUTPUT_DIR then:
dir.create(file.path(OUTPUT_DIR, OTHER_DIR), showWarnings = FALSE)
setwd(file.path(OUTPUT_DIR, OTHER_DIR))
dir.create() will just print a warning if the directory exists. If you want to see the warning, just remove the showWarnings = FALSE.
If it is just another directory at the same level as OUTPUT_DIR then:
dir.create(OTHER_DIR)
2) Then move the file (e.g. if OTHER_DIR is at the same level as OUTPUT_DIR):
file.rename("C:/OUTPUT_DIR/file.csv", "C:/OTHER_DIR/file.csv")

How to save all images in a separate folder?

So, I am running the following code:
dirtyFolder = "Myfolder/test"
filenames = list.files(dirtyFolder, pattern="*.png")
for (f in filenames)
{
print(f)
imgX = readPNG(file.path(dirtyFolder, f))
x = data.table(img2vec(imgX), kmeansThreshold(imgX))
setnames(x, c("raw", "thresholded"))
yHat = predict(gbm.mod, newdata=x, n.trees = best.iter)
img = matrix(yHat, nrow(imgX), ncol(imgX))
img.dt=data.table(melt(img))
names.dt<-names(img.dt)
setnames(img.dt,names.dt[1],"X1")
setnames(img.dt,names.dt[2],"X2")
Numfile = gsub(".png", "", f, fixed=TRUE)
img.dt[,id:=paste(Numfile,X1,X2,sep="_")]
write.table(img.dt[,c("id","value"),with=FALSE], file = "submission.csv", sep = ",", col.names = (f == filenames[1]),row.names = FALSE,quote = FALSE,append=(f != filenames[1]))
# show a sample
if (f == "4.png")
{
writePNG(imgX, "train_101.png")
writePNG(img, "train_cleaned_101.png")
}
}
What it does is basically, takes as input images which have noise in them and removes noise from them. This is only the later part of the code which applies the algorithm prepared from a training dataset (not shown here).
Now, I am not able to figure out, how can I save the cleaned image for each of the images in the test folder. That is, I wish to save the cleaned image for each of the images in the folder and not just the 4.png image. The output image should have the name as 4_cleaned.png if the input image has the name 4.png and it should be saved in a separate folder in the same directory. That is, if input image has the name x.png, the output image should have the name x_cleaned.png and saved in a separate folder. How can I do it?
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Alright, so construct the output filename using file.path and a function such as paste or sprintf:
folder_name = 'test'
output_filename_pattern = file.path(folder_name, '%s_cleaned.png')
remove_extension = function (filename)
gsub('\\.[^.]$', '', filename)
for (f in filenames) {
# … your code her …
new_filename = sprintf(output_filename_pattern, remove_extension(f))
# … save file here …
}

How to rename files with a specific pattern in R?

There are some .fcs files in a data.000X format (where X = 1, 2, 3...) in a directory.
I want to rename every n file to the following format: exp.fcs (where exp is a text from a vector) if the file to be renamed is an .fcs file.
in other words: I want to rename files to exp.txt, where exp is a text and not a consecutive letter(s) i.e. F, cA, K, etc.
For example, from:
data.0001, data.0002, data.0003, data.0004, data.0005, data.0006...
to
textF_a.fcs, textF_b.fcs, textF_c.fcs, textVv_a.fcs, textVv_b.fcs, textVv_c.fcs ...
I tried to do it with file.rename(from, to) but failed as the arguments have different lengths (and I don't know what it means):
a <- list.files(path = ".", pattern = "data.*$")
b <- paste("data", 1:1180, ".fcs", sep = "")
file.rename(a, b)
Based on your comments, one issue is that your first file isn't named "data.001" - it's named "data.1". Use this:
b <- sprintf("data%.4d.fcs", seq(a))
It prepends up to 3 0s (since it seems you have 1000+ files, this may be better) to indices < 1000, so that all names have the same width. If you really just want to see things like "data.001", then use %.3d in the command.
Your code "works" on my machine ("works" in the sense that, when I created a set of files and followed your procedure, the renaming happened correctly). The error is likely that the number of files that you have (length(a)) is different from the number of new names that you give (length(b)). Post back if it turns out that these objects do have the same length.
As with the (very similar) question here, this function might be of use to you. I wrote it to allow regex find and replace in R. If you're on a mac it can detect and use the frontmost Finder window as a target. Also supports test runs, over-write control, and filtering large folders.
umxRenameFile <- function(baseFolder = "Finder", findStr = NA, replaceStr = NA, listPattern = NA, test = T, overwrite = F) {
# uppercase = u$1
if(baseFolder == "Finder"){
baseFolder = system(intern = T, "osascript -e 'tell application \"Finder\" to get the POSIX path of (target of front window as alias)'")
message("Using front-most Finder window:", baseFolder)
} else if(baseFolder == "") {
baseFolder = paste(dirname(file.choose(new = FALSE)), "/", sep = "") ## choose a directory
message("Using selected folder:", baseFolder)
}
if(is.na(listPattern)){
listPattern = findStr
}
a = list.files(baseFolder, pattern = listPattern)
message("found ", length(a), " possible files")
changed = 0
for (fn in a) {
findB = grepl(pattern = findStr, fn) # returns 1 if found
if(findB){
fnew = gsub(findStr, replace = replaceStr, fn) # replace all instances
if(test){
message("would change ", fn, " to ", fnew)
} else {
if((!overwrite) & file.exists(paste(baseFolder, fnew, sep = ""))){
message("renaming ", fn, "to", fnew, "failed as already exists. To overwrite set T")
} else {
file.rename(paste(baseFolder, fn, sep = ""), paste(baseFolder, fnew, sep = ""))
changed = changed + 1;
}
}
}else{
if(test){
# message(paste("bad file",fn))
}
}
}
message("changed ", changed)
}

Resources