Renaming files from RStudio with A DF as Reference - r

I'm trying to rename files in a WD folder from RStudio. The files are named with IDs and I want to replace the IDs with names. I have a reference file which is a dataframe(urban_o) with supplierID, companyname, and vendornumber. I tried this for loop but it doesn't seem to work. Error - the condition has length > 1 and only the first element will be used. Any ideas where I'm getting it wrong?
original_names <- list.files()
urba_o <- import("C:\Users\MaangiJ\Downloads\urba_o.xlsx")
# for loop
for (x in original_names){
if(x == urba_o$supplierid[]){
file.rename(x,urba_o$CompanyName[])
}
}

file.rename is vectorized, so no for loop is needed. Something like this should work:
## figure out which files are here and need renaming
rows_to_rename = urba_o$supplierid %in% original_names
## rename them
with(urba_o[rows_to_rename, ], file.rename(supplierid, CompanyName))
If you did want a for loop, this would work (though it will be less efficient, as well as longer to write):
for (i in 1:nrow(urba_o)) {
if(urba_o$supplierid[i] %in% original_names) {
file.rename(urba_o$supplierid[i], urba_o$CompanyName[i])
}
}
Do note that you'll need to follow the file name rules for your operating system. For example, on Windows file names can't have the following reserved characters: <>/\*'":?|

Related

File renaming (string substitution) without a clear pattern using R

Currently, I am working with a long list of files.
They have a name pattern of SB_xxx_(parts). (different extensions), where xxx refers to an item code.
SB_19842.png
SB_19842_head.png
SB_19842_hand.png
SB_19842_head.pdf
...
It is found that many of these codes have incorrect entries.
I got two columns in hand: One is for old codes and one is new codes (let's say A & B). I hope to change all those old codes in the file names to the new code.
old new
12154 24124
92482 02425
.....
My first thought is to use file.rename()
However, it is a one-to-one changing approach. I cannot do this because every item has a different number of parts and different file extensions.
Is there any recursive method that can simply change all incorrect file names with strings in A and replace them with strings in B? Anyone get an idea, please?
A loop solution with purrr::map2 at the end:
library(purrr)
#create files to rename
file.create("SB_19842.png")
file.create("SB_19842_head.png")
file.create("SB_19842_hand.png")
file.create("SB_19842_head.pdf")
file.create("SB_12154.png")
file.create("SB_12154_head.png")
file.create("SB_12154_hand.png")
file.create("SB_12154_head.pdf")
# a dataframe with old a nd new patterns
file_names <- data.frame(
old = c("19842", "12154"),
new = c("new1", "new2")
)
# old filenames from the directory, specify path if needed
file_names_SB <- list.files(pattern = "SB_")
# function to substitute one type of code with another
sub_one_code <- function(old_code, new_code, file_names_original){
gsub(paste0("SB_", old_code), paste0("SB_", new_code), file_names_original)
}
# loop to substitute all codes
new_file_names <- file_names_SB
for (row in 1:nrow(file_names)){
new_file_names <- sub_one_code(file_names[row, "old"], file_names[row, "new"], new_file_names)
}
# rename all the files
map2(file_names_SB,
new_file_names,
file.rename)
#thelatemail provided a link with more elegant solutions for generating new file names.

R: Deleting Files in R based on their Names

I am working with the R programming language.
I found the following related question Stackoverflow (how to delete a file with R?) which shows you how to delete a file having a specific name from the working directory:
#Define the file name that will be deleted
fn <- "foo.txt"
#Check its existence
if (file.exists(fn)) {
#Delete file if it exists
file.remove(fn)
}
[1] TRUE
My Question: Is it possible to delete files based on whether the file name contains a specific combination of letters (i.e. LIKE 'fo%' )? This way, all files in the working directory starting with the letters "fo" will be deleted.
What I tried so far:
I thought of a way where I could first create a list of all files in the working directory that I want to delete based on their names:
# create list of all files in working directory
a = getwd()
path.to.csv <- a
files<-list.files(path.to.csv)
my_list = print(files) ## list all files in path
#identify files that match the condition
to_be_deleted = my_list[grepl("fo",unlist(my_list))]
Then, I tried to deleted this file using the command used earlier:
if (file.exists(to_be_deleted)) {
#Delete file if it exists
file.remove(to_be_deleted)
}
This returned the following message:
[1] TRUE TRUE TRUE TRUE TRUE TRUE
Warning message:
In if (file.exists(to_be_deleted)) { :
the condition has length > 1 and only the first element will be used
Does anyone know if I have done this correctly? Suppose if there were multiple files in the working directory where the names of these files started with "fo" - would all of these files have been deleted? Or only the first file in this list?
Can someone please show me how to do this correctly?
Thanks!
file.remove accepts a list of file to delete.
Regarding file.exists, it also accepts a list, but it will return a list of logical values, one for each file. And this won't work with if, which requires only one logical value.
However, you don't need to check the existence of files that you get from list.files: they obviously exist.
So, the simplest is to remove the if test and just call file.remove:
files <- list.files(path, pattern = "fo")
to_be_deleted <- grep("fo", files, value = T)
file.remove(to_be_deleted)
Or even simpler:
to_be_deleted <- list.files(path, pattern = "fo")
file.remove(to_be_deleted)
A few notes however:
Here you don't know in advance if you have the right to delete these
files.
You don't know either if the names are indeed files, or
directory (or something else). It's tempting to believe that
file.exists answer the second question, that is, it might tell you
that a name is a real file, but actually it does not:
file.exists(path) returns TRUE also when path is a directory.
However you can detect directories with dir.exists(path). Depending
on your specific case, it may or may not be necessary to check for
this (for instance, if you know the pattern passed to grep always
filters files, it's ok).

How to rename part of a file

I would to to rename part of a file name, because the structure is hardcoded in getfiles.
I have metabolomics mzML files containing ltQCs, sQCs and samples, but the name of the files have different lenghts (6,6,7).I am trying to run XCMS, but it only picks up ltQCs and sQCs, because the structure is hardcoded to 6. How do I change the structure of the filename see example below:
2020-02-02_B1W1_RP_NEG_P7_A20_001.mzML (structure of 7)
to
2020-02-02_B1W1_RP_NEG_P7A20_001.mzML (structure of 6)
I have higlighted the part that I would like to change. If this is impossible, maybe renaming the ltQCs and sQCs may be easier by adding a letter or number, so I get a structure of 7 and then change the structure in getfiles to 7.
Hope somebody can help, thank you:)
Best
You can change the file names with a regular expression using gsub which removes the penultimate underline
my_regex <- "(_)([[:alnum:]]{3}_[[:alnum:]]{3}[.]mzML)"
my_filename <- "2020-02-02_B1W1_RP_NEG_P7_A20_001.mzML"
gsub(my_regex, "\\2", my_filename)
#> [1] "2020-02-02_B1W1_RP_NEG_P7A20_001.mzML"
So you could do something like
rename_mzMLs <- function(directory)
{
filenames <- list.files(directory, pattern = ".mzML")
my_regex <- "(_)([[:alnum:]]{3}_[[:alnum:]]{3}[.]mzML)"
new_filenames <- gsub(my_regex, "\\2", filenames)
file.rename(filenames, new_filenames)
}
And run it by doing
rename_mzMLs("C:/path/to/mzML/files/")
Obviously, I can't test this since I don't have any mzML files, so ensure you back up your files before running this function!

Adding new column and its value based on file name

I have 10 data files in my current directory such as data-01, data-02, data-03, data-04 till data-10.
Each of these data files has a few hundred rows with 4 fields. I would like to add a new column name "ID" and keep its ID like 01 (for data file data-01) for all the rows in that file.
A base R solution using a loop would go like this:
df<- c()
for (x in list.files(pattern="*.csv")) {
u<-read.table(x)
u$Label = factor(x)
df <- rbind(df, u)
cat(x, "\n ")
}
This depends on your data files having the same number of columns (though you get get around that inside the loop by selecting which columns you need before rbind) and then you can set whichever filetype you are looking at. The cat is useful because you can better trace read problems (because there are always problems). I bet there is a better way to do this with apply as well.

How to store file names that partially match by looping through the directories in R?

Basically this is a for loop question. I have my.dir containing directories A, B and C. A has files apple.txt,orange.txt; B has files tomato.txt, carrot.txt; C has files cat.txt. I also have an object (myobject) with file names but partially matching names.
my.dir<- c("A","B","C")
myobject<-c("app","ora","car","Jak")
Now I want to use for loop and store all the files in A,B,C that match with the names in myobject and get the result files. I may not use some of the directories, for example, directory C is not being used as I am not extracting anything from there in this example. I also want to keep track of the names in myobject that are found and not found. for example, Jak is not found and app,ora and car are found.
I want to get something like this in the 'result' :
files
"apple.txt" "orange.txt" "carrot.txt"
found
"app" "ora" "car"
not found
"Jak"
This is what I have tried:
all.file.names <- {}
for (j in 1:length(my.dir)){
all.bam.files <- list.files(my.dir[j])
all.bam.files <- all.bam.files[grepl(".txt$",all.bam.files)]
all.file.names <- grep(myobject, all.bam.files, value=TRUE)
}

Resources