"last directory" from url as string - r

I would like to return "blah".
url <- "https://www.example.com/apples/pears/blah.csv"
I can get blah but I feel like I'm writing more lines of code than I should. Example, to get blah.csv I can do:
url_split <- str_split(url, "/")
dirname <- url_split[[1]][length(url_split[[1]])]
This gives me "blah.csv", where I can do a very similar code block as above to get "blah" by calling str_split again.
Is there a more sophisticated one liner to get the last directory in the url minus ".csv"?

fn <- basename("https://www.example.com/apples/pears/blah.csv")
gsub("\\..*$", "", fn)

Related

Concatenate string with escape characters to bookend string with "\"... \"" in R

I have a character vector of file paths that look like this:
xx <- c("data/lsa_two_isl_prosp_u.csv",
"data/lsa_two_isl_prosp_d.csv" ,
"data/lsa_two_isl_propsuit_u.csv")
However, I need these file paths to have "" concatenated on to the beginning of the string and "" concatenated onto the end, so that my string looks like this:
xx <- c("\"data/lsa_two_isl_prosp_u.csv\"",
"\"data/lsa_two_isl_prosp_d.csv\"" ,
"\"data/lsa_two_isl_propsuit_u.csv\"")
Normally I would use paste but the "\"... \"" are escape characters that need each other to 'bookend' a string.
In hindsight, an obviously doomed idea, but sharing to avoid anyone else who might try: If I try yo use paste('"\"', xx, '\""') , I get "\"\" data/lsa_two_isl_prosp_d.csv \"\"" , which is obviously wrong, and I cannot remove the excess portions of the string without throwing out all of it, incase you may have the same idea...
Any suggestions?
Found the answer after a lot of trial and error:
xx <- paste("\"", xx, "\"")

How to Change Part of URL With a Function Input in R?

Let's say we have a url in R like:
url <- 'http://google.com/maps'
And the objective is to change the 'maps' part of it. I'd like to write a function where basically I can just input something (e.g. 'maps', 'images'), etc., and the relevant part of the url will automatically change to reflect what I'm typing in.
Is there a way to do this in R, where part of the url can be changed by typing something into a function?
Thanks!
You have to store the part you type into a variable and paste this to the base URL:
base_url <- "http://google.com/"
your_extension <- "maps"
paste0(base_url, your_extension)
[1] "http://google.com/maps"
If you have to start with a fixed URL, use sub to replace the last part:
sub("\\w+$", 'foo', url)
# "http://google.com/foo"
You can use dirname to remove the last part of the URL and paste it with additional custom string.
change_url_part <- function(base_url, string) {
paste(dirname(base_url), string, sep = '/')
}
change_url_part('http://google.com/maps', 'images')
#[1] "http://google.com/images"

make file.exists() case insensitive

I have a line of code in my script that checks if a file exists (actually, many files, this one line gets looped for a bunch of different files):
file.exists(Sys.glob(file.path(getwd(), "files", "*name*")))
This looks for any file in the directory /files/ that has "name" in it, e.g. "filename.csv". However, some of my files are named "fileName.csv" or "thisfileNAME.csv". They do not get recognized. How can i make file.exists treat this check in a case insensitive way?
In my other code i usually make any imported names or lists immediately lowercase with the tolower function. But I don't see any option to include that in the file.exists function.
Suggested solution using list.files:
If we have many files we might want to do this only once, otherwise we can put in in the function (and pass path_to_root_directory instead of found_files to the function)
found_files <- list.files(path_to_root_directory, recursive=FALSE)
Behaviour as file.exists (return value is boolean):
fileExIsTs <- function(file_path, found_files) {
return(tolower(file_path) %in% tolower(found_files))
}
Return value is file with spelling as found in directory or character(0) if no match:
fileExIsTs <- function(file_path, found_files) {
return(found_files[tolower(found_files) %in% tolower(file_path)])
}
Edit:
New solution to fit new requirements:
keywordExists <- function(keyword, found_files) {
return(any(grepl(keyword, found_files, ignore.case=TRUE)))
}
keywordExists("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] TRUE
Or
Return value are files with spelling as found in directory or character(0) if no match:
keywordExists2 <- function(file_path, found_files) {
return(found_files[grepl(keyword, found_files, ignore.case=TRUE)])
}
keywordExists2("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] "filename.csv" "morefilenames.csv"
The following should return a 1 if the filename matches in any case and a 0 if it does not.
max(grepl("*name*",list.files()),ignore.case=T)

Test for exact string in testthat

I'd like to test that one of my functions gives a particular message (or warning, or error).
good <- function() message("Hello")
bad <- function() message("Hello!!!!!")
I'd like the first expectation to succeed and the second to fail.
library(testthat)
expect_message(good(), "Hello", fixed=TRUE)
expect_message(bad(), "Hello", fixed=TRUE)
Unfortunately, both of them pass at the moment.
For clarification: this is meant to be a minimal example, rather than the exact messages I'm testing against. If possible I'd like to avoid adding complexity (and probably errors) to my test scripts by needing to come up with an appropriate regex for every new message I want to test.
You can use ^ and $ anchors to indicate that that the string must begin and end with your pattern.
expect_message(good(), "^Hello\\n$")
expect_message(bad(), "^Hello\\n$")
#Error: bad() does not match '^Hello\n$'. Actual value: "Hello!!!!!\n"
The \\n is needed to match the new line that message adds.
For warnings it's a little simpler, since there's no newline:
expect_warning(warning("Hello"), "^Hello$")
For errors it's a little harder:
good_stop <- function() stop("Hello")
expect_error(good_stop(), "^Error in good_stop\\(\\) : Hello\n$")
Note that any regex metacharacters, i.e. . \ | ( ) [ { ^ $ * + ?, will need to be escaped.
Alternatively, borrowing from Mr. Flick's answer here, you could convert the message into a string and then use expect_true, expect_identical, etc.
messageToText <- function(expr) {
con <- textConnection("messages", "w")
sink(con, type="message")
eval(expr)
sink(NULL, type="message")
close(con)
messages
}
expect_identical(messageToText(good()), "Hello")
expect_identical(messageToText(bad()), "Hello")
#Error: messageToText(bad()) is not identical to "Hello". Differences: 1 string mismatch
Your rexeg matches "Hello" in both cases, thus it doesn't return an error. You''ll need to set up word boundaries \\b from both sides. It would suffice if you wouldn't use punctuations/spaces in here. In order to ditch them too, you'll need to add [^\\s ^\\w]
library(testthat)
expect_message(good(), "\\b^Hello[^\\s ^\\w]\\b")
expect_message(bad(), "\\b^Hello[^\\s ^\\w]\\b")
## Error: bad() does not match '\b^Hello[^\s ^\w]\b'. Actual value: "Hello!!!!!\n"

removing data with tags from a vector

I have a string vector which contains html tags e.g
abc<-""welcome <span class=\"r\">abc</span> Have fun!""
I want to remove these tags and get follwing vector
e.g
abc<-"welcome Have fun"
Try
> gsub("(<[^>]*>)","",abc)
what this says is 'substitute every instance of < followed by anything that isnt a > up to a > with nothing"
You cant just do gsub("<.*>","",abc) because regexps are greedy, and the .* would match up to the last > in your text (and you'd lose the 'abc' in your example).
This solution might fail if you've got > in your tags - but is <foo class=">" > legal? Doubtless someone will come up with another answer that involves parsing the HTML with a heavyweight XML package.
You can convert your piece of HTML to an XML document with
htmlParse or htmlTreeParse.
You can then convert it to text,
i.e., strip all the tags, with xmlValue.
abc <- "welcome <span class=\"r\">abc</span> Have fun!"
library(XML)
#doc <- htmlParse(abc, asText=TRUE)
doc <- htmlTreeParse(abc, asText=TRUE)
xmlValue( xmlRoot(doc) )
If you also want to remove the contents of the links,
you can use xmlDOMApply to transform the XML tree.
f <- function(x) if(xmlName(x) == "span") xmlTextNode(" ") else x
d <- xmlDOMApply( xmlRoot(doc), f )
xmlValue(d)

Resources