Delete files conditionally - r

I have a series of csv files in a folder and I want to delete them conditionally. The filename formats are like so :-
testfile_2020_05_01.csv
testfile_2020_05_02.csv
testfile_2020_05_03.csv
testfile_2020_05_04.csv
testfile_2020_05_05.csv
testfile_2020_05_06.csv
testfile_2020_05_07.csv
testfile_2020_05_08.csv
testfile_2020_05_09.csv
testfile_2020_05_10.csv
testfile_2020_05_11.csv
There are other files in the folder too. I want to delete the above files by specifying the following conditions :-
All files with filename beginning with "testfile" and ending with ".csv" with the length of the filename equal to 23.
Is there a way to do it in R?
I couldn't do it - I tried a lot of things like identifying ending in ".csv" or beginning with "testfile" but don't know how to combine multiple conditions with "list.files" command select these files.
Once I can select these files conditonally, I can then delete them using a loop unless there is a better and faster way of doing so.
Any suggestions/pointers would be greatly appreciated.
Best regards
Deepak

You can use :
#Get full path name for all the file
all_files <- list.files('/path/to/files', full.names = TRUE)
#Select conditionally files which start with testfile and end with csv
#and have 23 characters in them
files_to_delete <- all_files[grepl('^testfile.*\\.csv$', basename(all_files)) &
nchar(basename(all_files)) == 23]
#delete the files.
do.call(file.remove, list(files_to_delete))

Related

List files pattern with "AND" condition in R

I want to have a list which includes a character string and which excludes temporary files which include the same character string.
For example:
Files <- list.files(pattern= "Coca_cola" & "^[^~]")
I found on previous topics on this site that I have to use the "grep" function but I don't know how to use it in this context.
You can select the files that start with "Coca_cola" which will also drop the temporary files that begin with "~".
Files <- list.files(pattern= "^Coca_cola")
Would this do the trick?
Files <- grepl("Coca_cola", list.files)
Files <- !grepl("~", Files)

Selectively reading in csv files in R where filenames don't end in a particular suffix

I have a folder of csv files from an experiment that I want to read into R for analysis. The folder contains two files for every participant, with filenames following the pattern:
"participantID.csv"
"participantID_debrief.csv"
I want to create two variables in R, one for the standard data files and one for the debrief files. I have found the list.files function and see that the standard way to use this would be like:
files <- list.files(path="D:/data", pattern=".csv")
But I want to use the pattern parameter to match first only the filenames that don't end in "_debrief.csv" and then only the ones that do. How would I write the regular expression (assuming that's what pattern is) to achieve this?
Try:
files = list.files(path="D:/data")
non_debrief = files[!grepl("_debrief.csv", files)]
debrief = files[grepl("_debrief.csv", files)]
For a tidyverse approach, you could use the fs library (https://www.tidyverse.org/blog/2018/01/fs-1.0.0/).
base_dir = 'D:/data/'
file_list_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$')
file_list_non_debrief = fs::dir_ls(base_dir, glob = '*_debrief.csv$', invert = TRUE)

How can i List all files with .nc (netcdf) in a folder and extract 1 variable out of 10 variable?

My task is to get multiple similar NetCDF (.nc) files from a folder and stack one a variable out of 10 variables.
I used:
a <- list.files(path=ncpath, pattern = "nc$", full.names = TRUE)
This gets me all the files with .nc extenstion.
How to proceed for the second task?
I want this variable a from these number of files in a folder and stack them.
If you just want the output in a netcdf file, you might consider doing this task from the command line in linux and using cdo?
files=$(ls *.nc) # you might want to be more selective with your wildcard
# this loop selects the variable from each file and puts into file with the var name at the end
for file in $files ; do
cdo selvar,varname $file ${file%???}_varname.nc
done
# now merge into one file:
cdo merge *_varname.nc merged_file.nc
where you need to obviously replace varname with the name of your variable of choice.

glob2rx, placing a wildcard in the middle of expression and specificying exeptions, r

I have am writing an R script that performs a function for all files in a series of subdirectories. I have ran into a problem where several files in these subdirectories are being recognized by my glob2rx function, and I need help refining my pattern so I can select the file I want.
Here is an example of my directory structure:
subdir1
file1_aaa_111_subdir1.txt
file1_bbb_111_subdir1.txt
file1_aaa_subdir1.txt
subdir2
file1_aaa_111_subdir2.txt
file1_bbb_111_subdir2.txt
file1_aaa_subdir2.txt
I want to select for the last file in each directory, although in my actual directory its position is varied. I want to use something like:
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa*.txt"), full.names=TRUE)
but I dont get any files. In looking at this pattern, I would in theory get both the first and last file in each directory. Meaning I need to write an exception to exclude the aaa_111 files, and keep the aaa_subdir files.
There is a second option I have been thinking about, but lack the ability to realize. Notice the name of the subdirectory is at the end of each file name. Is it possible to extract the directory name, and then combine it with a glob2rx pattern, and then directly specify which file I want? Like this:
#list all the subdirectories
subDirsPaths = list.dirs(path=".", full.names=TRUE)
#perform a function on these directories one by one
for (subDirsPath in subDirsPaths){
#make the subdirectory the working directory
setwd("/home/phil/Desktop/working")
setwd(paste(subDirsPath, sep=""))
# get the working directory name, and trim the "./" from it
directory <- gsub("./", "", paste(subDirsPath, sep=""))
# attempt to the get the desired file by pasting the directory name into the glob2rx funtion
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa_", print(directory,".txt")), full.names=TRUE)
for (inFilePath in inFilePaths)
{
inFileData <- read_tsv(inFilePath, col_names=TRUE)
}
}
With some modification the second option worked well. I ended up using paste in combination with print as follows:
inFilePaths = list.files(path=".", pattern=glob2rx(print(paste("*", "aaa_", directory, ".txt", sep=""))), full.names=TRUE)
The paste function combined the text into a single string, which also preserved the wildcard. The print function added this to the list.files function as the glob2rx pattern.
While this doesn't allow me to place a wild card in the middle of an expression, which I believe is done use an escape character, and it doesn't address the need to place exceptions on the wild card, it works for my purposes.
I hope this helps others in my position.

Using R to list all files with a specified extension

I'm very new to R and am working on updating an R script to iterate through a series of .dbf tables created using ArcGIS and produce a series of graphs.
I have a directory, C:\Scratch, that will contain all of my .dbf files. However, when ArcGIS creates these tables, it also includes a .dbf.xml file. I want to remove these .dbf.xml files from my file list and thus my iteration. I've tried searching and experimenting with regular expressions to no avail. This is the basic expression I'm using (Excluding all of the various experimentation):
files <- list.files(pattern = "dbf")
Can anyone give me some direction?
files <- list.files(pattern = "\\.dbf$")
$ at the end means that this is end of string. "dbf$" will work too, but adding \\. (. is special character in regular expressions so you need to escape it) ensure that you match only files with extension .dbf (in case you have e.g. .adbf files).
Try this which uses globs rather than regular expressions so it will only pick out the file names that end in .dbf
filenames <- Sys.glob("*.dbf")
Peg the pattern to find "\\.dbf" at the end of the string using the $ character:
list.files(pattern = "\\.dbf$")
Gives you the list of files with full path:
Sys.glob(file.path(file_dir, "*.dbf")) ## file_dir = file containing directory
I am not very good in using sophisticated regular expressions, so I'd do such task in the following way:
files <- list.files()
dbf.files <- files[-grep(".xml", files, fixed=T)]
First line just lists all files from working dir. Second one drops everything containing ".xml" (grep returns indices of such strings in 'files' vector; subsetting with negative indices removes corresponding entries from vector).
"fixed" argument for grep function is just my whim, as I usually want it to peform crude pattern matching without Perl-style fancy regexprs, which may cause surprise for me.
I'm aware that such solution simply reflects drawbacks in my education, but for a novice it may be useful =) at least it's easy.

Resources