How to extract folder's name (a series number ID) in R? - r

I have got a large number of folders containing CSV and htm files. Each folder has an unique folder name as ID. I am wondering would it be possible extract folder names?

For any directory path string, you can use:
fullpath = getwd()
directoryname = basename(fullpath)
For a whole range of directories:
manydirectories = list.dirs()
directorynames = basename(manydirectories)
You only need to omit the "." in directorynames to then get the folder names within that directory.

Related

Delete files conditionally

I have a series of csv files in a folder and I want to delete them conditionally. The filename formats are like so :-
testfile_2020_05_01.csv
testfile_2020_05_02.csv
testfile_2020_05_03.csv
testfile_2020_05_04.csv
testfile_2020_05_05.csv
testfile_2020_05_06.csv
testfile_2020_05_07.csv
testfile_2020_05_08.csv
testfile_2020_05_09.csv
testfile_2020_05_10.csv
testfile_2020_05_11.csv
There are other files in the folder too. I want to delete the above files by specifying the following conditions :-
All files with filename beginning with "testfile" and ending with ".csv" with the length of the filename equal to 23.
Is there a way to do it in R?
I couldn't do it - I tried a lot of things like identifying ending in ".csv" or beginning with "testfile" but don't know how to combine multiple conditions with "list.files" command select these files.
Once I can select these files conditonally, I can then delete them using a loop unless there is a better and faster way of doing so.
Any suggestions/pointers would be greatly appreciated.
Best regards
Deepak
You can use :
#Get full path name for all the file
all_files <- list.files('/path/to/files', full.names = TRUE)
#Select conditionally files which start with testfile and end with csv
#and have 23 characters in them
files_to_delete <- all_files[grepl('^testfile.*\\.csv$', basename(all_files)) &
nchar(basename(all_files)) == 23]
#delete the files.
do.call(file.remove, list(files_to_delete))

How can i List all files with .nc (netcdf) in a folder and extract 1 variable out of 10 variable?

My task is to get multiple similar NetCDF (.nc) files from a folder and stack one a variable out of 10 variables.
I used:
a <- list.files(path=ncpath, pattern = "nc$", full.names = TRUE)
This gets me all the files with .nc extenstion.
How to proceed for the second task?
I want this variable a from these number of files in a folder and stack them.
If you just want the output in a netcdf file, you might consider doing this task from the command line in linux and using cdo?
files=$(ls *.nc) # you might want to be more selective with your wildcard
# this loop selects the variable from each file and puts into file with the var name at the end
for file in $files ; do
cdo selvar,varname $file ${file%???}_varname.nc
done
# now merge into one file:
cdo merge *_varname.nc merged_file.nc
where you need to obviously replace varname with the name of your variable of choice.

List.files from web accessible folder

I need to be able to fill a vector with the files in a web accessible folder.
The variables used in setting the image_dir variable are set to the following:
msu_path = "http://oer.hpc.msstate.edu/okeanos/"
sub$cruiseID = "EX1504L2"
divespecs$specID = EX1504L2_20150802T223100_D2_DIVE01_SPEC01GEO/
image_dir = http://oer.hpc.msstate.edu/okeanos/ex1504l2/EX1504L2_20150802T223100_D2_DIVE01_SPEC01GEO/
image_dir <- sprintf("%s%s/%s", msu_path, tolower(sub$cruiseID), divespecs$specID)
file_names <- list.files(path = image_dir, pattern = "jpg", ignore.case=TRUE)
If I do this exact thing but use a path within my working directory, it works fine. The image_dir does receive a valid URL using the above setting, but it returns file_names as a character (empty) value instead of a vector of the .jpg files in that WAF.
Thanks for any advice.

glob2rx, placing a wildcard in the middle of expression and specificying exeptions, r

I have am writing an R script that performs a function for all files in a series of subdirectories. I have ran into a problem where several files in these subdirectories are being recognized by my glob2rx function, and I need help refining my pattern so I can select the file I want.
Here is an example of my directory structure:
subdir1
file1_aaa_111_subdir1.txt
file1_bbb_111_subdir1.txt
file1_aaa_subdir1.txt
subdir2
file1_aaa_111_subdir2.txt
file1_bbb_111_subdir2.txt
file1_aaa_subdir2.txt
I want to select for the last file in each directory, although in my actual directory its position is varied. I want to use something like:
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa*.txt"), full.names=TRUE)
but I dont get any files. In looking at this pattern, I would in theory get both the first and last file in each directory. Meaning I need to write an exception to exclude the aaa_111 files, and keep the aaa_subdir files.
There is a second option I have been thinking about, but lack the ability to realize. Notice the name of the subdirectory is at the end of each file name. Is it possible to extract the directory name, and then combine it with a glob2rx pattern, and then directly specify which file I want? Like this:
#list all the subdirectories
subDirsPaths = list.dirs(path=".", full.names=TRUE)
#perform a function on these directories one by one
for (subDirsPath in subDirsPaths){
#make the subdirectory the working directory
setwd("/home/phil/Desktop/working")
setwd(paste(subDirsPath, sep=""))
# get the working directory name, and trim the "./" from it
directory <- gsub("./", "", paste(subDirsPath, sep=""))
# attempt to the get the desired file by pasting the directory name into the glob2rx funtion
inFilePaths = list.files(path=".", pattern=glob2rx("*aaa_", print(directory,".txt")), full.names=TRUE)
for (inFilePath in inFilePaths)
{
inFileData <- read_tsv(inFilePath, col_names=TRUE)
}
}
With some modification the second option worked well. I ended up using paste in combination with print as follows:
inFilePaths = list.files(path=".", pattern=glob2rx(print(paste("*", "aaa_", directory, ".txt", sep=""))), full.names=TRUE)
The paste function combined the text into a single string, which also preserved the wildcard. The print function added this to the list.files function as the glob2rx pattern.
While this doesn't allow me to place a wild card in the middle of an expression, which I believe is done use an escape character, and it doesn't address the need to place exceptions on the wild card, it works for my purposes.
I hope this helps others in my position.

Recognize arbitrary file extensions in R?

I'm writing a function in R that will take the path name of a folder as its argument and return a vector containing the names of all the files in that folder which have the extension ".pvalues".
myFunction <- function(path) {
# return vector that contains the names of all files
# in this folder that end in extension ".pvalues"
}
I know how to get the names of the files in the folder, like so:
> list.files("/Users/me/myfolder/")
[1] "myfile.txt"
[2] "myfile.txt.a"
[3] "myfile.txt.b"
[4] "myfile.txt.a.pvalues"
[5] "myfile.txt.b.pvalues"
Is there an easy way to identify all the files in this folder that end in ".pvalues"? I cannot assume that the names will start with "myfile". They could start with "yourfile", for instance.
take a look at ?list.files. You want the pattern argument. list.files(path='/Users/me/myfolder', pattern='*\\.pvalues$')

Resources