R case insensitive capturing group - r

This regex :
str_extract_all("This is a Test , ' ' " , "[a-z]+")
returns :
[1] "his" "is" "a" "est"
How to modify so this is case insensitive ?
`[1] "This" "is" "a" "Test"`
should instead be returned
Should /i remove case sensitive ?
Trying str_extract_all("This is a Test , ' ' " , "[a-z]+/i")
returns
[[1]]
character(0)

There is a special notation for stringr functions:
regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE,
dotall = FALSE, ...)
You may use
> str_extract_all("This is a Test , ' ' " , regex("[a-z]+", ignore_case=TRUE))
[[1]]
[1] "This" "is" "a" "Test"
Alternatively, use an inline i modifier (?i):
str_extract_all("This is a Test , ' ' " , "(?i)[a-z]+")

You could try including the capital letters in the set you're searching for.
str_extract_all("This is a Test , ' ' " , "[A-Za-z]+")
If you only want the first letter to be capitalized you could try the code below. It lets the first letter be case insensitive and then have only lowercase afterward.
str_extract_all("This is a Test , ' ' " , "[A-Za-z][a-z]*")

Related

gsub not working for dataframe's variable R

I donot understand what I am doing wrong.
I have a dataframe and one of the variables looks like this.
ss <- c("F00020 " , "F13975 " , "F13976 " , "F15334 " , "F12490 " , "F09787 " , "F14675 " ,
"F12129 " , "F04641 " , "F04680 " , "F04715 " , "F04753 " , "F08868 " , "F14031 " ,
"F14033 " , "F12585 " , "F14663 ")
I want to omit the extra blank spaces.
gsub("[[:space:]]","",ss)
The above code works but if I directly call the variable from the dataframe it's not working.
gsub("[[:space:]]","",df$Variable)
I also checked the type of the vector/variables, both are same as a character vector.
So what is happening here?
I cannot reproduce your error:
ss <- c("F00020 " , "F13975 " , "F13976 " , "F15334 " , "F12490 " , "F09787 " , "F14675 " ,
"F12129 " , "F04641 " , "F04680 " , "F04715 " , "F04753 " , "F08868 " , "F14031 " ,
"F14033 " , "F12585 " , "F14663 ")
gsub("[[:space:]]","",ss)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
df <- data.frame(Variable = ss)
gsub("[[:space:]]","",df$Variable)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
An easy solution for your use case is with trimws:
trimws(ss)
[1] "F00020" "F13975" "F13976" "F15334" "F12490" "F09787" "F14675" "F12129" "F04641" "F04680" "F04715"
[12] "F04753" "F08868" "F14031" "F14033" "F12585" "F14663"
Yes, as noted by others, your solution does work too, just as this, shorter, one does:
sub("\\s", "", ss) # no `gsub` needed **iff** there's always just one whitespace per string (in whatever position)

regex, R and [:punct:] in grep --> return items from a list not containing any[:punct:] [duplicate]

This question already has an answer here:
POSIX character class does not work in base R regex
(1 answer)
Closed 2 years ago.
I have this list of strings:
stringg <- c("csv.asef", "ac ed", "asdf$", "asdf", "dasf]", "sadf {sadf")
if I want to get all strings containing special characters like so:
grep("[:punct:]+", stringg, value = TRUE)
--------------------------------------------
Result:
[1] "csv.asef" "ac ed"
What I should get is:
[1] "csv.asef" "asdf$" "dasf]" "sadf {sadf"
if I use:
grep("[!\\"#$%&’()*+,-./:;<=>?#[]^_`{|}~.]+", stringg, value = TRUE)
-----------------------------------------------------------------
Result is ERROR
I want these special characters: € ! " # $ % & ’ ( ) * + , - . / : ; < = > ? # [ ] ^ _ ` { | } ~. which [:punct:] doesn't have
I know if I want the strings not containing any of those characters then I would use:
[^ € ! " # $ % & ’ ( ) * + , - . / : ; < = > ? # [ ] ^ _ ` { | } ~.]
but how do I do it with [:punct:]:
[^:punct:]?
[^:punct:]{0}?
and how could i combine ^[:punct:] | ^€ ?
many thanks
According to ?regex
Most metacharacters lose their special meaning inside a character class. To include a literal ], place it first in the list.
grep("[[:punct:]]+", stringg, value = TRUE)
#[1] "csv.asef" "asdf$" "dasf]" "sadf {sadf"
If we want the opposite, use invert = TRUE
grep("[[:punct:]€]", stringg, value = TRUE, invert = TRUE)
#[1] "ac ed" "asdf"

Insert characters when a string changes its case R

I would like to insert characters in the places were a string change its case. I tried this to insert a '\n' after a fixed number of characters and then a ' ', as I don't figure out how to detect the case change
s <-c("FloridaIslandE7", "FloridaIslandE9", "Meta")
gsub('^(.{7})(.{6})(.*)$', '\\1\\\n\\2 \\3', s )
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
This works because the positions are fixed but I would like to know how to do it for the general case.
Surely there's a less convoluted regex for this, but you could try:
gsub('([A-Z][0-9])', ' \\1', gsub('([a-z])([A-Z])', '\\1\n\\2', s))
Output:
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
Here is an option
str_replace_all(s, "(?<=[a-z])(?=[A-Z])", "\n")
#[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"
If you really want to insert \n, try this:
gsub("([a-z])([A-Z])", "\\1\\\n\\2", s)
[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"

How to print double quotes (") in R

I want to print to the screen double quotes (") in R, but it is not working. Typical regex escape characters are not working:
> print('"')
[1] "\""
> print('\"')
[1] "\""
> print('/"')
[1] "/\""
> print('`"')
[1] "`\""
> print('"xml"')
[1] "\"xml\""
> print('\"xml\"')
[1] "\"xml\""
> print('\\"xml\\"')
[1] "\\\"xml\\\""
I want it to return:
" "xml" "
which I will then use downstream.
Any ideas?
Use cat:
cat("\" \"xml\" \"")
OR
cat('" "','xml','" "')
Output:
" "xml" "
Alternative using noqoute:
noquote(" \" \"xml\" \" ")
Output :
" "xml" "
Another option using dQoute:
dQuote(" xml ")
Output :
"“ xml ”"
With the help of the print parameter quote:
print("\" \"xml\" \"", quote = FALSE)
> [1] " "xml" "
or
cat('"')

Highlight keywords in classic ASP

I have this sentense, "The man went outside".
I also have 4 search criterias I would like to get highligted (ignore the brackets), [went|"an WeNT o"|a|t] with [span id="something"][/span].
I have tried out a lot of stuff but I can't figure out how to do this in classic ASP!? If I insert a somewhere in the text, it will search the HTML code for SPAN too, which is bad or it will not find the text as it has been messed up with HTML code. I also tried inserting on all positions in the original text and even with some magic regular expression which I do not understand but I can't get this working :-/
The search-thing is divided with | and can be anything from 1 to 20 things to search for.
Can anyone help me solving how to do this?
I found and tweaked some code and it works perfectly for me:
Function highlightStr (haystack, needles)
' Taken (and tweaked) from these two sites:
' http://forums.aspfree.com/asp-development-5/asp-highlight-keywords-295641.html
' http://www.eggheadcafe.com/forumarchives/scriptingVisualBasicscript/Jul2005/post23377133.asp
'
' INPUT: haystack = search in this string
' INPUT: needles = searches divided by |... example: this|"is a"|search
' OUTPUT: HTML formatted highlighted string
'
If Len(haystack) > 0 Then
' Delete the first and the last array separator "|" (if any)
If Left(needles,1) = "|" Then needles = Right(needles,Len(needles)-1)
If Right(needles,1) = "|" Then needles = Mid(needles,1,Len(needles)-1)
' Delete a multiple seperator (if any)
needles = Replace(needles,"||","|")
' Delete the exact-search chars (if any)
needles = Replace(needles,"""","")
' Escape all special regular expression chars
needles = Replace(needles,"(","\(")
needles = Replace(needles,")","\)")
needles = Replace(needles,".","\.")
If Len(needles) > 0 Then
haystack = " " & haystack & " "
Set re = New RegExp
re.Pattern = "(" & needles & ")"
re.IgnoreCase = True
re.Global = True
highlightStr = re.Replace(haystack,"<span style='background-color:khaki;'>$&</span>")
Else
highlightStr = haystack
End If
Else
highlightStr = haystack
End If
End Function

Resources