How do I extract a file/folder_name only from a path? - r

Unfortunately I suck at regexp. If I have a path like so:
/long/path/to/file, I just need to extact file.
If someone supplies file/ I just need file.
If someone supplies /file/, I still need just file.
I've been using stringr functions as a crutch but this seems like straight up grep territory. Help, please?

If I understand correctly, you could use the basename function.
f <- "/long/path/to/file"
basename(f)
# [1] "file"

What about this?
> path <- "/long/path/to/file"
> require(stringr)
> str_extract(path, "[^/]*$")
[1] "file"

Sorry for giving an answer to a very old question, but I was led here searching for a way to extract only the directory part of a full filename.
So here is, how you extract the directory:
> f <- "/long/path/to/file"
> dirname(f)
[1] "/long/path/to"

Related

Extract only folder name right before filename from full path

I have the following path
filePath <- "/data/folder1/subfolder1/foo.dat"
I'd like to get subfolder1 which is where foo.dat locates. I saw solutions in other languages but haven't found one in R. What is the simplest way to do it? Thank you!
What I tried
> basename(filePath)
[1] "foo.dat"
> dirname(filePath)
[1] "/data/folder1/subfolder1"
This may solve:
filePath <- "/data/folder1/subfolder1/foo.dat"
basename(dirname(filePath))
http://www.r-fiddle.org/#/fiddle?id=IPftVEDk&version=1
This may not be the prettiest answer, but it will work for you:
unlist(strsplit(filePath, '/'))[length(unlist(strsplit(filePath, '/')))-1]

Extract segment of filename

I'm trying to extract a filename and save the dataframe with that same name.
The problem I have is that if the filename for some reason is inside a folder with a similar word, stringr will return that word as well.
filename <- "~folder/testdata/2016/testdata 2016.csv"
If I run this:
library(stringr)
str <- str_trim(stringr::str_extract(filename,"[t](.*)"), "left") it returns testdata/2016/testdata 2016.csv when all I want is testdata 2016. Optimally it would even be better to get testdata2016.
I've been trying several combinations but there has to be a simpler way of doing this. If there was a way of reading the path from right to left, starting at .csv stop at /, I wouldn't have this issue.
You can have below approaches:
library(stringr)
str_replace(str_extract(filename,"\\w*\\s+\\w*(?=\\.)"),"\\s+","")
str_replace_all(basename(filename),"\\s+|\\.csv","")
You can use basename approach as suggested by Benjamin.
?basename:
basename removes all of the path up to and including the last path
separator (if any).
Output:
[1] "testdata2016"
Plenty of help in base R (tools pkg comes with the default R install):
gsub(" ", "",
tools::file_path_sans_ext(
basename("~folder/testdata/2016/testdata 2016.csv")))

parsing xml file manually with r

for some reason, I cannot download the r xml package at work. I have an xml file that has contents like this:
x<-read.table("info.xml")
x
</name></content></item><item id="id-123"><content><name>
</name></content></item><item id="id-456"><content><name>
</name></content></item><item id="id-5559"><content><name>
I need to pick values that start with id and - and the numbers like
id-123, id-456 id-5559, etc
tried this:
str_extract_all(x, "id-[0-9]")
but is only printing id-1, I really need help very quick. Any ideas?
str_extract_all(x, "id-[0-9]+")
The regular expression "id-[0-9]" is missing a "+" at the end.
There may be more issues, but that one jumps out.

How can I determine the current directory name in R?

The only solution I've encountered is to use regular expressions and recursively replace the first directory until you get a word with no slashes.
gsub("/\\w*/","/",gsub("/\\w*/","/",getwd()))
Is there anything slightly more elegant? (and more portable?)
Your example code doesn't work for me, but you're probably looking for either basename or dirname:
> getwd()
[1] "C:/cvswork/data"
> basename(getwd())
[1] "data"
> dirname(getwd())
[1] "C:/cvswork"
If you didn't know basename (and I didn't), you could have used this:
tail(strsplit(getwd(), "/")[[1]], 1)

How escape or sanatize slash using regex in R?

I'm trying to read in a (tab separted) csv file in R. When I want to read the column including a /, I get an error.
doSomething <- function(dataset) {
a <- dataset$data_transfer.Jingle/TCP.total_size_kb
...
}
The error says, that this object cannot be found. I've tried escaping with backslash but it did not work.
If anybody has got some idea, I'd really appreciate it!
Give
head(dataset)
and watch the name it has been given. Perhaps it would be something like:
dataset$data_transfer.Jingle.TCP.total_size_kb
Two ways:
dataset[["data_transfer.Jingle/TCP.total_size_kb"]]
or
dataset$`data_transfer.Jingle/TCP.total_size_kb`

Resources