Renaming files with the same name - r

I'm having troubles with renaming files. I want to keep my other files with the same name but adding an identifier ("_1","_2","_3"..."_x") for example.
I'm using the package "R.utils" and the renameFile function.
directory <- "C:\\Users\\javie\\Documents\\Programacion\\Script\\Prueba_Rename"
pattern <- "_TED version=_1.0___DD__RE_XXXXXXXX-2__RE__TD_52__TD__F"
setwd(directory)
renameFile(list.files(pattern = pattern),
str_replace(list.files(pattern = papatternron),
pattern = pattern, ""),
overwrite=FALSE)
If the file already exist, it returns
"Error: File already exists: Guia_De_Despacho__77166.pdf"
which is good, because I don't want to overwriting. But how can change the name of the file ann add an identifier if it already exists?

Related

R: How to match a forward-slash in a regular expression?

How do I match on a forward slash / in a regular expression in R?
As demonstrated in the example below, I am trying to search for .csv files in a subdirectory and my attempts to use a literal / are failing. Looking for a modification to my regex in base R, not a function that does this for me.
Example subdirectory
# Create subdirectory in current working directory with two .csv files
# - remember to delete these later or they'll stay in your current working directory!
dir.create(path = "example")
write.csv(data.frame(x1 = letters), file = "example/example1.csv")
write.csv(data.frame(x2 = 1:20), file = "example/example2.csv")
Get relative paths of all .csv files in the example subdirectory
# This works for the example, but could mistakenly return paths to other files based on:
# (a) file name: foo/example1.csv
# (b) subdirectory name: example_wrong/foo.csv
list.files(pattern = "example.*csv", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# This fixes issue (a) but doesn't fix issue (b)
list.files(pattern = "^example.*?\\.csv$", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# Adding / to the end of `example` guarantees we get the correct subdirectory
# Doesn't work: / is special regex and not escaped
list.files(pattern = "^example/.*?\\.csv$", recursive = TRUE)
# Doesn't work: escapes / but throws error
list.files(pattern = "^example\/.*?\\.csv$", recursive = TRUE)
# Doesn't work: even with the \\ escaping in R!
list.files(pattern = "^example\\/.*?\\.csv$", recursive = TRUE)
Some of the solutions above work with regex tools but not in R. I've checked SO for solutions (most related below) but none seem to apply:
Escaping a forward slash in a regular expression
Regex string does not start or end (or both) with forward slash
Reading multiple csv files from a folder with R using regex
The pattern argument is only used for matching file (or directory) names, not the full path they are on (even when recursive and full.names are set to TRUE). That's why your last approach doesn't work even though it is the correct way to match / in a regular expression. You can get the correct file names by specifying path and setting full.names to TRUE.
list.files(path='example', pattern='\\.csv$', full.names=T)

R: Deleting Files in R based on their Names

I am working with the R programming language.
I found the following related question Stackoverflow (how to delete a file with R?) which shows you how to delete a file having a specific name from the working directory:
#Define the file name that will be deleted
fn <- "foo.txt"
#Check its existence
if (file.exists(fn)) {
#Delete file if it exists
file.remove(fn)
}
[1] TRUE
My Question: Is it possible to delete files based on whether the file name contains a specific combination of letters (i.e. LIKE 'fo%' )? This way, all files in the working directory starting with the letters "fo" will be deleted.
What I tried so far:
I thought of a way where I could first create a list of all files in the working directory that I want to delete based on their names:
# create list of all files in working directory
a = getwd()
path.to.csv <- a
files<-list.files(path.to.csv)
my_list = print(files) ## list all files in path
#identify files that match the condition
to_be_deleted = my_list[grepl("fo",unlist(my_list))]
Then, I tried to deleted this file using the command used earlier:
if (file.exists(to_be_deleted)) {
#Delete file if it exists
file.remove(to_be_deleted)
}
This returned the following message:
[1] TRUE TRUE TRUE TRUE TRUE TRUE
Warning message:
In if (file.exists(to_be_deleted)) { :
the condition has length > 1 and only the first element will be used
Does anyone know if I have done this correctly? Suppose if there were multiple files in the working directory where the names of these files started with "fo" - would all of these files have been deleted? Or only the first file in this list?
Can someone please show me how to do this correctly?
Thanks!
file.remove accepts a list of file to delete.
Regarding file.exists, it also accepts a list, but it will return a list of logical values, one for each file. And this won't work with if, which requires only one logical value.
However, you don't need to check the existence of files that you get from list.files: they obviously exist.
So, the simplest is to remove the if test and just call file.remove:
files <- list.files(path, pattern = "fo")
to_be_deleted <- grep("fo", files, value = T)
file.remove(to_be_deleted)
Or even simpler:
to_be_deleted <- list.files(path, pattern = "fo")
file.remove(to_be_deleted)
A few notes however:
Here you don't know in advance if you have the right to delete these
files.
You don't know either if the names are indeed files, or
directory (or something else). It's tempting to believe that
file.exists answer the second question, that is, it might tell you
that a name is a real file, but actually it does not:
file.exists(path) returns TRUE also when path is a directory.
However you can detect directories with dir.exists(path). Depending
on your specific case, it may or may not be necessary to check for
this (for instance, if you know the pattern passed to grep always
filters files, it's ok).

Selectively reading in csv files in R where filenames don't end in a particular suffix

I have a folder of csv files from an experiment that I want to read into R for analysis. The folder contains two files for every participant, with filenames following the pattern:
"participantID.csv"
"participantID_debrief.csv"
I want to create two variables in R, one for the standard data files and one for the debrief files. I have found the list.files function and see that the standard way to use this would be like:
files <- list.files(path="D:/data", pattern=".csv")
But I want to use the pattern parameter to match first only the filenames that don't end in "_debrief.csv" and then only the ones that do. How would I write the regular expression (assuming that's what pattern is) to achieve this?
Try:
files = list.files(path="D:/data")
non_debrief = files[!grepl("_debrief.csv", files)]
debrief = files[grepl("_debrief.csv", files)]
For a tidyverse approach, you could use the fs library (https://www.tidyverse.org/blog/2018/01/fs-1.0.0/).
base_dir = 'D:/data/'
file_list_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$')
file_list_non_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$', invert = TRUE)

List.files from web accessible folder

I need to be able to fill a vector with the files in a web accessible folder.
The variables used in setting the image_dir variable are set to the following:
msu_path = "http://oer.hpc.msstate.edu/okeanos/"
sub$cruiseID = "EX1504L2"
divespecs$specID = EX1504L2_20150802T223100_D2_DIVE01_SPEC01GEO/
image_dir = http://oer.hpc.msstate.edu/okeanos/ex1504l2/EX1504L2_20150802T223100_D2_DIVE01_SPEC01GEO/
image_dir <- sprintf("%s%s/%s", msu_path, tolower(sub$cruiseID), divespecs$specID)
file_names <- list.files(path = image_dir, pattern = "jpg", ignore.case=TRUE)
If I do this exact thing but use a path within my working directory, it works fine. The image_dir does receive a valid URL using the above setting, but it returns file_names as a character (empty) value instead of a vector of the .jpg files in that WAF.
Thanks for any advice.

glob2rx, placing a wildcard in the middle of expression and specificying exeptions, r

I have am writing an R script that performs a function for all files in a series of subdirectories. I have ran into a problem where several files in these subdirectories are being recognized by my glob2rx function, and I need help refining my pattern so I can select the file I want.
Here is an example of my directory structure:
subdir1
file1_aaa_111_subdir1.txt
file1_bbb_111_subdir1.txt
file1_aaa_subdir1.txt
subdir2
file1_aaa_111_subdir2.txt
file1_bbb_111_subdir2.txt
file1_aaa_subdir2.txt
I want to select for the last file in each directory, although in my actual directory its position is varied. I want to use something like:
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa*.txt"), full.names=TRUE)
but I dont get any files. In looking at this pattern, I would in theory get both the first and last file in each directory. Meaning I need to write an exception to exclude the aaa_111 files, and keep the aaa_subdir files.
There is a second option I have been thinking about, but lack the ability to realize. Notice the name of the subdirectory is at the end of each file name. Is it possible to extract the directory name, and then combine it with a glob2rx pattern, and then directly specify which file I want? Like this:
#list all the subdirectories
subDirsPaths = list.dirs(path=".", full.names=TRUE)
#perform a function on these directories one by one
for (subDirsPath in subDirsPaths){
#make the subdirectory the working directory
setwd("/home/phil/Desktop/working")
setwd(paste(subDirsPath, sep=""))
# get the working directory name, and trim the "./" from it
directory <- gsub("./", "", paste(subDirsPath, sep=""))
# attempt to the get the desired file by pasting the directory name into the glob2rx funtion
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa_", print(directory,".txt")), full.names=TRUE)
for (inFilePath in inFilePaths)
{
inFileData <- read_tsv(inFilePath, col_names=TRUE)
}
}
With some modification the second option worked well. I ended up using paste in combination with print as follows:
inFilePaths = list.files(path=".", pattern=glob2rx(print(paste("*", "aaa_", directory, ".txt", sep=""))), full.names=TRUE)
The paste function combined the text into a single string, which also preserved the wildcard. The print function added this to the list.files function as the glob2rx pattern.
While this doesn't allow me to place a wild card in the middle of an expression, which I believe is done use an escape character, and it doesn't address the need to place exceptions on the wild card, it works for my purposes.
I hope this helps others in my position.

Resources