Is there an existing function to concatenate paths?
I know it is not that difficult to implement, but still... besides taking care of trailing / (or \) I would need to take care of proper OS path format detection (i.e. whether we write C:\dir\file or /dir/file).
As I said, I believe I know how to implement it; the question is: should I do it? Does the functionality already exist in existing R package?
Yes, file.path()
R> file.path("usr", "local", "lib")
[1] "usr/local/lib"
R>
There is also the equally useful system.path() for files in a package:
R> system.file("extdata", "date_time_zonespec.csv", package="RcppBDT")
[1] "/usr/local/lib/R/site-library/RcppBDT/extdata/date_time_zonespec.csv"
R>
which will get the file extdata/date_time_zonespec.csv irrespective of
where the package is installed, and
the OS
which is very handy. Lastly, there is also
R> .Platform$file.sep
[1] "/"
R>
if you insist on doing it manually.
In case anyone wants, this is my own function path.cat. Its functionality is comparable with Python's os.path.join with the extra sugar, that it interprets the ...
With this function, you can construct paths hierarchically, but unlike the file.path, you leave the user the ability to override the hierarchy by putting an absolute path. And as an added sugar, he can put the ".." wherever he likes in the path, with obvious meaning.
e.g.
path.cat("/home/user1","project/data","../data2") yelds /home/user1/project/data2
path.cat("/home/user1","project/data","/home/user2/data") yelds /home/user2/data
The function works only with slashes as path separator, which is fine, since R transparently translates them to backslashes on Windows machine.
library("iterators") # After writing this function I've learned, that iterators are very inefficient in R.
library("itertools")
#High-level function that inteligentely concatenates paths given in arguments
#The user interface is the same as for file.path, with the exception that it understands the path ".."
#and it can identify relative and absolute paths.
#Absolute paths starts comply with "^\/" or "^\d:\/" regexp.
#The concatenation starts from the last absolute path in arguments, or the first, if no absolute paths are given.
path.cat<-function(...)
{
elems<-list(...)
elems<-as.character(elems)
elems<-elems[elems!='' && !is.null(elems)]
relems<-rev(elems)
starts<-grep('^[/\\]',relems)[1]
if (!is.na(starts) && !is.null(starts))
{
relems<-relems[1:starts]
}
starts<-grep(':',relems,fixed=TRUE)
if (length(starts)==0){
starts=length(elems)-length(relems)+1
}else{
starts=length(elems)-starts[[1]]+1}
elems<-elems[starts:length(elems)]
path<-do.call(file.path,as.list(elems))
elems<-strsplit(path,'[/\\]',FALSE)[[1]]
it<-ihasNext(iter(elems))
out<-rep(NA,length(elems))
i<-1
while(hasNext(it))
{
item<-nextElem(it)
if(item=='..')
{
i<-i-1
} else if (item=='' & i!=1) {
#nothing
} else {
out[i]<-item
i<-i+1
}
}
do.call(file.path,as.list(out[1:i-1]))
}
Related
I have a script in which I call R and depending on the directory I specify I want it to carry out a different process. One directory starts with L and the other with S. I have numerous directories that either start with L or S and they all end differently.
I specify the directory in bash and run a script like so:
./script L_dir
or
./script S_dir
So within my R script I have it set up as such:
args <- commandArgs(TRUE)
img_dir <- args[1]
if(img_dir == "^L*"){
do_process_1
} else {
do_process_2
}
Everything works fine except that no matter what directory I specify, the process called will always be do_process_2.
I have looked at this question and tried to adapt it but can't get it to work.
After changing my code to
if(grepl("^LM*", img_dir)){
do_process_1
} else {
do_process_2
}
it worked. Be careful if you change it to the above and it still carries out process_2. This may be because what you are looking for, in my case ^L*, may also be in your second directory name i.e. dir_L = LMNOP, dir_S = STUVLJH. But once i specified ^LM* it did what i wanted it to do.
I have an R script that takes a file as input, and I want a general way to know whether the input is a file that exists, and is not a directory.
In Python you would do it this way: How do I check whether a file exists using Python?, but I was struggling to find anything similar in R.
What I'd like is something like below, assuming that the file.txt actually exists:
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
is.file(input.good) # should return TRUE
is.file(input.bad) #should return FALSE
R has something called file.exists(), but this doesn't distinguish files from directories.
There is a dir.exists function in all recent versions of R.
file.exists(f) && !dir.exists(f)
The solution is to use file_test()
This gives shell-style file tests, and can distinguish files from folders.
E.g.
input.good = "~/directory/file.txt"
input.bad = "~/directory/"
file_test("-f", input.good) # returns TRUE
file_test("-f", input.bad) #returns FALSE
From the manual:
Usage
file_test(op, x, y) Arguments
op a character string specifying the test to be performed. Unary
tests (only x is used) are "-f" (existence and not being a directory),
"-d" (existence and directory) and "-x" (executable as a file or
searchable as a directory). Binary tests are "-nt" (strictly newer
than, using the modification dates) and "-ot" (strictly older than):
in both cases the test is false unless both files exist.
x, y character vectors giving file paths.
You can also use is_file(path) from the fs package.
If I do list.files('~') on Linux I get the contents of my home directory.
If I do list.files('%userprofiles%') from Windows, I get an empty character as the return.
How can I use the special directories in this manner on Windows?
This isn't the same as this question because using ~ in Windows gets me %userprofile%/documents which I don't want. As a plan B I can use that and use string manipulation to take out "/documents" but that seems pretty hacky.
I'm not sure if you would consider this "hacky", but you can try something like:
list.files(dirname(path.expand("~")))
From #nongkrong's comments...
Sys.getenv("USERPROFILE") will return the correct directory. Using Sys.getenv() will work for other special directories too. Fortunately it is possible to mix "\\", which Sys.getenv() returns, with "/" which are more convenient to use for full paths.
I'm looking for a way to prevent R from overwriting files during the session. The more general solution then better.
Currently I got bunch of functions called e.g.: safe.save, safe.png, safe.write.table which are implemented more or less as
safe.smth <- function(..., file) {
if (file.exists(file))
stop("File exists!")
else
smth(..., file=file)
}
It works, but only if I got control over execution. If some (not mine) function created file I can't stop it from overwrite.
Another way is to set read only flag on files, which also top R from overwriting existing files. But this has drawbacks as well (e.g.: you don't know which files needs to be protected).
Or write one-liner:
protect <- function(p) if (file.exists(p)) stop("File exsits!") else p
and use it always when providing filename.
Is there a way to force this behaviour session wide? Some kind of global setting for connections? Maybe only for subset of functions (graphics devices, file-created connections, etc)? Maybe some system specific solution?
The following could be used as test case:
test <- function(i) {
try(write.table(i, "test_001.csv"))
try(writeLines(as.character(i), "test_002.txt"))
try({png("test_003.png");plot(i);dev.off()})
try({pdf("test_004.pdf");plot(i);dev.off()})
try(save(i, file="test_005.RData"))
try({f<-file("test_006.txt", "w");cat(as.character(i), file=f);close(f)})
}
test(1)
magic_incantations() # or magic_incantations(test(2)), etc.
test(2) # should fail on all writes (to test set read-only to files from first call)
The conventional way to avoid clobbering data files isn't to look for OS hacks, but to use filenames and directories that are special for your session.
session.dir <- tempdir()
...
write.table(i, file.path(session.dir,"test_001.csv"))
writeLines(as.character(i), file.path(session.dir,"test_002.txt"))
...
or
session.pid <- Sys.getpid()
...
write.table(i, paste0("test_001.",session.pid,".csv"))
writeLines(as.character(i), paste0("test_002.",session.pid,".txt"))
...
I wish to get the fully qualified name of a file in R, given any of the standard notations. For example:
file.ext
~/file.ext (this case can be handled by path.expand)
../current_dir/file.ext
etc.
By fully qualified file name I mean, for example, (on a Unix-like system):
/home/user/some/path/file.ext
(Edited - use file.path and attempt Windows support) A crude implementation might be:
path.qualify <- function(path) {
path <- path.expand(path)
if(!grepl("^/|([A-Z|a-z]:)", path)) path <- file.path(getwd(),path)
path
}
However, I'd ideally like something cross-platform that can handle relative paths with ../, symlinks etc. An R-only solution would be preferred (rather than shell scripting or similar), but I can't find any straightforward way of doing this, short of coding it "from scratch".
Any ideas?
I think you want normalizePath():
> setwd("~/tmp/bar")
> normalizePath("../tmp.R")
[1] "/home/gavin/tmp/tmp.R"
> normalizePath("~/tmp/tmp.R")
[1] "/home/gavin/tmp/tmp.R"
> normalizePath("./foo.R")
[1] "/home/gavin/tmp/bar/foo.R"
For Windows, there is argument winslash which you might want to set all the time as it is ignored on anything other than Windows so won't affect other OSes:
> normalizePath("./foo.R", winslash="\\")
[1] "/home/gavin/tmp/bar/foo.R"
(You need to escape the \ hence the \\) or
> normalizePath("./foo.R", winslash="/")
[1] "/home/gavin/tmp/bar/foo.R"
depending on how you want the path presented/used. The former is the default ("\\") so you could stick with that if it suffices, without needing to set anything explicitly.
On R 2.13.0 then the "~/file.ext" bit also works (see comments):
> normalizePath("~/foo.R")
[1] "/home/gavin/foo.R"
I think I kind of miss the point of your question, but hopefully my answer can point you into the direction you want (it integrates your idea of using paste and getwdwith list.files):
paste(getwd(),substr(list.files(full.names = TRUE), 2,1000), sep ="")
Edit: Works on windows in some tested folders.