R - Find the location of the file - r

Below is the wrapper function I created to find the file location. The function works, but I would like to know if there is any simpler solution than this.
The purpose of this function is to find the folder of the file. Since list.files returns the directory and the file name, I cant use this as an input for setwd().
setwd(list.files(fileName)) will not work
Questions:
Is there any function which will give the folder so I dont have to create a wrapper function?
How can I find the last "/" in a string. I played with regexpr("\\\[^\\.]*$", Dir) and kept getting error.
Any answers or feedbacks are greatly appreciated.
Code:
findFileLocation <- function(FileName,...) {
#Find the location of the file
Dir <- list.files(pattern = FileName, recursive = TRUE)
#> Dir
#[1] "10-30/No time line/folderNames.csv"
positionOfDot <- regexpr("\\.[^\\.]*$", Dir)
#> positionOfDot
#[1] 18
numCharFile <- nchar(FileName)
#> numCharFile
#[1] 15
numCharDir <- nchar(Dir)
#> numCharDir
#[1] 21
fileDir <- substr(Dir, 1, (numCharDir-(numCharFile+1))) #+1 is to account for the "/"
fileDir #returns the actual location of the file
}
test <- findFileLocation("folderNames.csv")
from here I can execute the code:
setwd(file.path(mainDir, test))
Note: basename and dirname are already tried.

Thanks to #MrFlick. The answer is dirname(list.files(pattern = FileName, recursive = TRUE))

Since the first question was already answered, let me answer the second question here:
How can I find the last "/" in a string. I played with regexpr("\\\[^\\.]*$", Dir) and kept getting error.
The error message I get when I try to use this regular expression is:
Error: '[' is an unrecognized escape in character string starting ""\["
The problem reported here is that a third backslash is used (\) where in fact a forward slash (/) was intended. Using regexpr("\\/[^\\.]*$", Dir) instead doesn't throw any errors. However, it doesn't do what was intended, i.e. it does not find the last forward slash. This is because this regular expression searches for forward slashes that are not followed by any dots (.), where in fact the idea was to search for forward slashes that are not followed by any (more) forward slashes.
Thus, the correct regular expression for the described use case is regexpr("\\/[^\\/]*$", Dir).
Dir <- "10-30/No time line/folderNames.csv"
regexpr("\\/[^\\/]*$", Dir)
# returns 19

Related

List files that end with pattern and lack an extension

I have a directory with multiple subdirectories that contain files.
The files themselves have no extension; however, each file has an additional header file with the extension ".hdr".
In R, I want to list all file names that contain the string map_masked and end with the pattern "masked", but I only want the files without an extension (the ones that end with the pattern, not the header files).
As suggested in this answer, I tried to use the $ sign to indicate the pattern should occur at the end of a line.
This is the code I used:
dir <- "/my/directory"
list.files(dir, pattern = "map_masked|masked$", recursive = TRUE)
The output, however, looks as follows:
[1] "subdirectory/something_map_masked_something_masked"
[2] "subdirectory/something_map_masked_something_masked.hdr"
etc.
Now, how do I tell R to exclude the files that have an ".hdr" extension?
I am aware this could easily be done by applying a filter on the output, but I would rather like to know what is wrong with my code and understand why R behaves the way it does in this case.
You can use
list.files(dir, pattern = "map_masked.*masked$", recursive = TRUE)
It returns filepaths that contain map_masked and end with masked string.
Details:
map_masked - a fixed string
.* - any zero or more chars as many as possible
masked - a masked substring
$ - end of string.
See the regex demo.

R: How to match a forward-slash in a regular expression?

How do I match on a forward slash / in a regular expression in R?
As demonstrated in the example below, I am trying to search for .csv files in a subdirectory and my attempts to use a literal / are failing. Looking for a modification to my regex in base R, not a function that does this for me.
Example subdirectory
# Create subdirectory in current working directory with two .csv files
# - remember to delete these later or they'll stay in your current working directory!
dir.create(path = "example")
write.csv(data.frame(x1 = letters), file = "example/example1.csv")
write.csv(data.frame(x2 = 1:20), file = "example/example2.csv")
Get relative paths of all .csv files in the example subdirectory
# This works for the example, but could mistakenly return paths to other files based on:
# (a) file name: foo/example1.csv
# (b) subdirectory name: example_wrong/foo.csv
list.files(pattern = "example.*csv", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# This fixes issue (a) but doesn't fix issue (b)
list.files(pattern = "^example.*?\\.csv$", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# Adding / to the end of `example` guarantees we get the correct subdirectory
# Doesn't work: / is special regex and not escaped
list.files(pattern = "^example/.*?\\.csv$", recursive = TRUE)
# Doesn't work: escapes / but throws error
list.files(pattern = "^example\/.*?\\.csv$", recursive = TRUE)
# Doesn't work: even with the \\ escaping in R!
list.files(pattern = "^example\\/.*?\\.csv$", recursive = TRUE)
Some of the solutions above work with regex tools but not in R. I've checked SO for solutions (most related below) but none seem to apply:
Escaping a forward slash in a regular expression
Regex string does not start or end (or both) with forward slash
Reading multiple csv files from a folder with R using regex
The pattern argument is only used for matching file (or directory) names, not the full path they are on (even when recursive and full.names are set to TRUE). That's why your last approach doesn't work even though it is the correct way to match / in a regular expression. You can get the correct file names by specifying path and setting full.names to TRUE.
list.files(path='example', pattern='\\.csv$', full.names=T)

Use R fs::dir_ls to match the beginning of file name?

I'm trying to use fs::dir_ls() to return the same results as the list.files() example below. Ultimately, I'm just trying to return files that start with a specific pattern.
path <- "./path/to/files"
pattern <- "^ABC_.*\\.csv$"
# list files returns the expected output
list.files(path = path, pattern = pattern, full.names = T)
# [1] "path/to/files/ABC_1312.csv"
# [2] "path/to/files/ABC_ACAB.csv"
# dir_ls does not return any matching files
fs::dir_ls(path = path, regexp = pattern)
# character(0)
I think the issue here is that the scope of each method's pattern argument differs. The list.files() pattern is only applied to the basename() of the file path, whereas, the dir_ls() regexp argument is being applied to the full path. As a result, the ^ regex is being applied to the start of the path, instead of the beginning of each file. Is there a way to limit the scope of dir_ls() to only match patterns on the basename() of each file similar to list.files()? Any other insights are appreciated!
See this issue on GitHub:
you need to modify your regular expression to match the full path then, or use a filtering function that only looks at the basename.
Use
pattern <- paste0(.Platform$file.sep, "ABC_.*\\.csv$)
You can also do something like
regexp = fs::path(path, pattern)

R, obtain complete file path string in files names in Windows (spaces and more)

Certainly an old issue, but I was not able to find a solution (maybe there are none). On Unix it is straight forward to use the R function file.path to obtain the path to some file. How can the same thing be done under Windows when spaces in paths return with ~ .
If I need to write, say the path to Rscript.exe to a file, this would work on unix:
x <- list.files(R.home("bin"), full.names = T, pattern = "Rscript")
writeLines(x, con = "path_to_rscript.txt")
On Windows the result is:
C:/PROGRA~1/R/R-35~1.1/bin/x64/Rscript.exe
Where I would have wanted something like:
C:/Program Files/R-3.5.1/bin/x64/Rscript.exe
Is there a way to circumvent this behavior (and what is it with the capitalized PROGRA ?).
Indeed, checkout normalizePath:
normalizePath(path, winslash = "\\", mustWork = NA)
which states explicitly:
On Windows it converts relative paths to absolute paths, converts
short names for path elements to long names and ensures the separator
is that specified by winslash. It will match paths case-insensitively
and return the canonical case. UTF-8-encoded paths not valid in the
current locale can be used.

How to modify i in an R loop?

I have several large R objects saved as .RData files: "this.RData", "that.RData", "andTheOther.RData" and so on. I don't have enough memory, so I want to load each in a loop, extract some rows, and unload it. However, once I load(i), I need to strip the ".RData" part of (i) before I can do anything with objects "this", "that", "andTheOther". I want to do the opposite of what is described in How to iterate over file names in a R script? How can I do that? Thx
Edit: I omitted to mention the files are not in the working directory and have a filepath as well. I came across Getting filename without extension in R and file_path_sans_ext takes out the extension but the rest of the path is still there.
Do you mean something like this?
i <- c("/path/to/this.RDat", "/another/path/to/that.RDat")
f <- gsub(".*/([^/]+)", "\\1", i)
f1 <- gsub("\\.RDat", "", f)
f1
[1] "this" "that"
On windows' paths you have to use "\\" instead of "/"
Edit: Explanation. Technically, these are called "regular
expressions" (regexps), not "patterns".
. any character
.* arbitrary number (including 0) of any kind of characters
.*/ arbitrary number of any kind of characters, followed by a
/
[^/] any character but not /
[^/]+ arbitrary number (1 or more) of any kind of characters,
but not /
( and ) enclose groups. You can use the groups when
replacing as \\1, \\2 etc.
So, look for any kind of character, followed by /, followed by
anything but not the path separator. Replace this with the "anything
but not separator".
There are many good tutorials for regexps, just look for it.
A simple way to do this using would be to extract the base name from the filepaths with base::basename() and then remove the file extension with tools::file_path_sans_ext().
paths_to_files <- c("./path/to/this.RData", "./another/path/to/that.RData")
tools::file_path_sans_ext(
basename(
paths_to_files
)
)
## Returns:
## [1] "this" "that"

Resources