So, I am running the following code:
dirtyFolder = "Myfolder/test"
filenames = list.files(dirtyFolder, pattern="*.png")
for (f in filenames)
{
print(f)
imgX = readPNG(file.path(dirtyFolder, f))
x = data.table(img2vec(imgX), kmeansThreshold(imgX))
setnames(x, c("raw", "thresholded"))
yHat = predict(gbm.mod, newdata=x, n.trees = best.iter)
img = matrix(yHat, nrow(imgX), ncol(imgX))
img.dt=data.table(melt(img))
names.dt<-names(img.dt)
setnames(img.dt,names.dt[1],"X1")
setnames(img.dt,names.dt[2],"X2")
Numfile = gsub(".png", "", f, fixed=TRUE)
img.dt[,id:=paste(Numfile,X1,X2,sep="_")]
write.table(img.dt[,c("id","value"),with=FALSE], file = "submission.csv", sep = ",", col.names = (f == filenames[1]),row.names = FALSE,quote = FALSE,append=(f != filenames[1]))
# show a sample
if (f == "4.png")
{
writePNG(imgX, "train_101.png")
writePNG(img, "train_cleaned_101.png")
}
}
What it does is basically, takes as input images which have noise in them and removes noise from them. This is only the later part of the code which applies the algorithm prepared from a training dataset (not shown here).
Now, I am not able to figure out, how can I save the cleaned image for each of the images in the test folder. That is, I wish to save the cleaned image for each of the images in the folder and not just the 4.png image. The output image should have the name as 4_cleaned.png if the input image has the name 4.png and it should be saved in a separate folder in the same directory. That is, if input image has the name x.png, the output image should have the name x_cleaned.png and saved in a separate folder. How can I do it?
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Tldr; I just want to save the variable named img for each of the filename as number_cleaned.png where number corresponds to the original file name. These new files should be saved in a separate folder.
Alright, so construct the output filename using file.path and a function such as paste or sprintf:
folder_name = 'test'
output_filename_pattern = file.path(folder_name, '%s_cleaned.png')
remove_extension = function (filename)
gsub('\\.[^.]$', '', filename)
for (f in filenames) {
# … your code her …
new_filename = sprintf(output_filename_pattern, remove_extension(f))
# … save file here …
}
Related
I have a question on reading large txt files and separate it based on the character "TIME".
Each "TIME" represents the pressure of a spatial area at a particular point in time.
How should I write the readtext functions that recognize the "TIME" characters and split them ?
I would first create a folder so that I can save the new files in it. Also, I would put the original data file in this folder.
# setwd("....") # Set the working directory as the folder you just created.
I saved the data structure that you provided in "data.txt"
The following lines will split your data (which is in "data.txt" in my computer) into files that have consecutive names, such as "data1.txt", "data2.txt", and so on.
incon = file("data.txt", "r")
i = 0
while (TRUE) {
line = readLines(incon, n = 1)
if (length(line) == 0) {
break
}
if (regexpr("TIME:", line) > 0) {
if (exists("outcon")) close(outcon)
i = i + 1
outcon = file(paste("data", i, sep=""), "w")
writeLines(line, outcon)
} else {
writeLines(line, outcon)
}
}
close(outcon)
I am new to R, and have built a working series of loops that copies and renames a single file from multible subfolders.
Great it works!! ... but what is this question about then?
Well, I have learned enough of R to know that most users agree that one should avoid loops and use functions instead whenever possible. This is because functions are a lot faster than loops.
When I apply my loop to my real data, it will have to cycle through +10000 files, and I would therefore like to make it as fast as possible.
Can my loop be expressed by using functions instead?
Alternatively can the loop be optimized in some way?
Any examples or suggestions on how to achieve any of the above will be much appreciated.
Info on my file-structure:
F:/Study
1000
1001
Gait1
annotations.txt
Gait2
annotations.txt
1002
Gait1
annotations.txt
Gait2
annotations.txt
59000
59003
Gait1
annotations.txt
Gait2
annotations.txt
nb. the "Gait" folders contain many other files and directories, but I am only interested in the "annotations.txt" files.
My loop:
for( a in seq(1000, 99000, by = 1000) ) {
if (!dir.exists(paste0("F:/Study/", a))) {
next()
}
for ( b in seq(1, 200, by = 1) ) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b))) {
next()
}
for ( c in seq(1, 2, by = 1)) {
if (!dir.exists(paste0("F:/Study/", a,"/", a+b, "/", "Gait", c))) {
next()
}
file.copy(from = paste0("F:/Study/", a,"/", a+b, "/", "Gait", c,"/annotations.txt"), to = "F:/Study/Annotations", overwrite = TRUE)
setwd("F:/Study/Annotations")
file.rename( from = "annotations.txt", to = paste0(a+b, "_Gait", c, "_annotations.txt") )
}
}
}
Result is files in my Annotations folder called:
1001_Gait1_annotations
1001_Gait2_annotations
1002_Gait1_annotations
1002_Gait2_annotations
59003_Gait1_annotations
59003_Gait2_annotations
tl;dr Can the loop be expressed using functions? How?
You might try the following (I have assumed that your /Annotations directory already exists). Does that work for you?
#get all file names (full path) with the pattern "annotations" within all folders in directory
files <- list.files("F:/Study/", full.names = TRUE, recursive = TRUE, pattern = "annotations")
#extract the desired parts for the new filename with regex
#d+ stands for 1+x digits, .* for anything and $ for the end of line
#extract only the second capturing group (the parts within parenthesis) via \\2
filenames_new <- gsub("(.*/)(\\d+/Gait\\d+.*$)", "\\2", files)
filenames_new <- gsub("/", "_", filenames_new)
#create the new filepaths
files_new <- paste0("F:/Study/Annotations/", filenames_new)
#perform copy
file.copy(from = files, to = files_new)
This line gives me hierarchical directory paths down to the files:
dirs<- as.data.frame(list.dirs(path = rootdir, full.names = F, recursive = T))
Like so:
"","list.dirs(path = rootdir, full.names = F, recursive = T)"
"1",""
"2","19"
"3","19/H"
"4","19/H/BA"
"5","19/H/BA/2016"
"6","19/H/BA/2016/11"
"7","19/H/BA/2016/11/10"
"8","19/H/BA/2016/11/10/0" # files are in here
"9","19/H/BA/2016/12"
"10","19/H/BA/2016/12/20"
"11","19/H/BA/2016/12/20/0" # files are in here
"12","19/H/BA/2017"
"13","19/H/BA/2017/1"
"14","19/H/BA/2017/1/19"
"15","19/H/BA/2017/1/19/0" # files are in here
"16","19/H/BA/2017/1/29"
"17","19/H/BA/2017/1/29/0" # files are in here
"18","19/H/BA/2017/3"
"19","19/H/BA/2017/3/20"
"20","19/H/BA/2017/3/20/0" # files are in here
But how would I write the code to only give me the paths to the files? I.e.,
"19/H/BA/2016/11/10/0"
"19/H/BA/2016/12/20/0"
"19/H/BA/2017/1/19/0"
"19/H/BA/2017/1/29/0"
"19/H/BA/2017/3/20/0"
You can use dirname instead of a regular expression, this will handle special cases like rootdir == "C:/" or rootdir == "../" :
unique(dirname(list.files(rootdir,rec=T)))
We can use list.files to get the path of all the files which are present (so that it doesn't give us any empty directory paths).
filepath = list.files(rootdir, recursive = T)
Now this will have path to all the files, we can use sub to remove the filenames from it and keep only the directory name.
sub("[/].*", "", filepath)
This removes everything from /. Finally to avoid duplication we can take unique of it.
To do everything in one liner.
unique(sub("[/].*", "", list.files(rootdir, recursive = T)))
I have over 700 files in one folder named as:
files from number 1 to number9 are named for the first month:
water_200101_01.img
water_200101_09.img
files from number 10 to number30 are named:
water_200101_10.img
water_200101_30.img
And so on for the second month:
files from number 1 to number9 are named:
water_200102_01.img
water_200102_09.img
files from number 10 to number30 are named:
water_200102_10.img
water_200102_30.img
How can I rename them without making any changes to the files. just change the nams, for example
water_1
water_2
...till...
water_700
file.rename will rename files, and it can take a vector of both from and to names.
So something like:
file.rename(list.files(pattern="water_*.img"), paste0("water_", 1:700))
might work.
If care about the order specifically, you could either sort the list of files that currently exist, or if they follow a particular pattern, just create the vector of filenames directly (although I note that 700 is not a multiple of 30).
I will set aside the question, "why would you want to?" since you seem to be throwing away information in the filename, but presumably that information is contained elsewhere as well.
I wrote this for myself. It is fast, allows regex in find and replace, can ignore the file suffix, and can show what would happen in a "trial run" as well as protect against over-writing existing files.
If you are are on a mac, it can use applescript to pick out the current folder in the Finder as a target folder.
umx_rename_file <- function(findStr = "Finder", replaceStr = NA, baseFolder = "Finder", test = TRUE, ignoreSuffix = TRUE, listPattern = NULL, overwrite = FALSE) {
umx_check(!is.na(replaceStr), "stop", "Please set a replaceStr to the replacement string you desire.")
# ==============================
# = 1. Set folder to search in =
# ==============================
if(baseFolder == "Finder"){
baseFolder = system(intern = TRUE, "osascript -e 'tell application \"Finder\" to get the POSIX path of (target of front window as alias)'")
message("Using front-most Finder window:", baseFolder)
} else if(baseFolder == "") {
baseFolder = paste(dirname(file.choose(new = FALSE)), "/", sep = "") ## choose a directory
message("Using selected folder:", baseFolder)
}
# =================================================
# = 2. Find files matching listPattern or findStr =
# =================================================
a = list.files(baseFolder, pattern = listPattern)
message("found ", length(a), " possible files")
changed = 0
for (fn in a) {
if(grepl(pattern = findStr, fn, perl= TRUE)){
if(ignoreSuffix){
# pull suffix and baseName (without suffix)
baseName = sub(pattern = "(.*)(\\..*)$", x = fn, replacement = "\\1")
suffix = sub(pattern = "(.*)(\\..*)$", x = fn, replacement = "\\2")
fnew = gsub(findStr, replacement = replaceStr, x = baseName, perl= TRUE) # replace all instances
fnew = paste0(fnew, suffix)
} else {
fnew = gsub(findStr, replacement = replaceStr, x = fn, perl= TRUE) # replace all instances
}
if(test){
message(fn, " would be changed to: ", omxQuotes(fnew))
} else {
if((!overwrite) & file.exists(paste(baseFolder, fnew, sep = ""))){
message("renaming ", fn, "to", fnew, "failed as already exists. To overwrite set T")
} else {
file.rename(paste0(baseFolder, fn), paste0(baseFolder, fnew))
changed = changed + 1;
}
}
}else{
if(test){
# message(paste("bad file",fn))
}
}
}
if(test & changed==0){
message("set test = FALSE to actually change files.")
} else {
umx_msg(changed)
}
}
If you want to replace a certain section of the file name that matches a given pattern with another pattern. This is useful for renaming several files at once. For example, this code would take all of your files containing foo and replace foo with bob in the file names.
file.rename(list.files(pattern = "foo"), str_replace(list.files(pattern = "foo"),pattern = "foo", "bob"))
The following was my workaround for matching in sequence and changing all the filenames in a specified directory using simple base code.
old_files <- list.files(path = ".", pattern="water_*.img$")
# Create df for new files
new_files <- data.frame()
for(i in 1:length(old_files)){
new_files <- append(paste0(path = ".", substr(old_files[i], 1,6),"water_",i,".img"), new_files)
}
new_files <- as.character(new_files)
# Copy from old files to new files
file.rename(from = old_files), to = as.vector(new_files)
There are some .fcs files in a data.000X format (where X = 1, 2, 3...) in a directory.
I want to rename every n file to the following format: exp.fcs (where exp is a text from a vector) if the file to be renamed is an .fcs file.
in other words: I want to rename files to exp.txt, where exp is a text and not a consecutive letter(s) i.e. F, cA, K, etc.
For example, from:
data.0001, data.0002, data.0003, data.0004, data.0005, data.0006...
to
textF_a.fcs, textF_b.fcs, textF_c.fcs, textVv_a.fcs, textVv_b.fcs, textVv_c.fcs ...
I tried to do it with file.rename(from, to) but failed as the arguments have different lengths (and I don't know what it means):
a <- list.files(path = ".", pattern = "data.*$")
b <- paste("data", 1:1180, ".fcs", sep = "")
file.rename(a, b)
Based on your comments, one issue is that your first file isn't named "data.001" - it's named "data.1". Use this:
b <- sprintf("data%.4d.fcs", seq(a))
It prepends up to 3 0s (since it seems you have 1000+ files, this may be better) to indices < 1000, so that all names have the same width. If you really just want to see things like "data.001", then use %.3d in the command.
Your code "works" on my machine ("works" in the sense that, when I created a set of files and followed your procedure, the renaming happened correctly). The error is likely that the number of files that you have (length(a)) is different from the number of new names that you give (length(b)). Post back if it turns out that these objects do have the same length.
As with the (very similar) question here, this function might be of use to you. I wrote it to allow regex find and replace in R. If you're on a mac it can detect and use the frontmost Finder window as a target. Also supports test runs, over-write control, and filtering large folders.
umxRenameFile <- function(baseFolder = "Finder", findStr = NA, replaceStr = NA, listPattern = NA, test = T, overwrite = F) {
# uppercase = u$1
if(baseFolder == "Finder"){
baseFolder = system(intern = T, "osascript -e 'tell application \"Finder\" to get the POSIX path of (target of front window as alias)'")
message("Using front-most Finder window:", baseFolder)
} else if(baseFolder == "") {
baseFolder = paste(dirname(file.choose(new = FALSE)), "/", sep = "") ## choose a directory
message("Using selected folder:", baseFolder)
}
if(is.na(listPattern)){
listPattern = findStr
}
a = list.files(baseFolder, pattern = listPattern)
message("found ", length(a), " possible files")
changed = 0
for (fn in a) {
findB = grepl(pattern = findStr, fn) # returns 1 if found
if(findB){
fnew = gsub(findStr, replace = replaceStr, fn) # replace all instances
if(test){
message("would change ", fn, " to ", fnew)
} else {
if((!overwrite) & file.exists(paste(baseFolder, fnew, sep = ""))){
message("renaming ", fn, "to", fnew, "failed as already exists. To overwrite set T")
} else {
file.rename(paste(baseFolder, fn, sep = ""), paste(baseFolder, fnew, sep = ""))
changed = changed + 1;
}
}
}else{
if(test){
# message(paste("bad file",fn))
}
}
}
message("changed ", changed)
}