I wanna ask that why the regex ^(19([6-9]\d|5[2-9]))|(20([01]\d|2[0-7]))$ will reject 2027XXXX but not 1952XXXX - google-forms

I have also tried to add {4} at the end of the
regex(^(19([6-9]\d|5[2-9]))|(20([01]\d|2[0-7])){4}$)
However, it doesn't help to reject a longer input like 19991
example

Related

US Zip+4 Validation

I have a req to provide US Zip+4 with the +4 being optional and the +4 can't be 0000. I'm doing this in .NET therefore I'm using RegularExpressionValidator with RegEx set. In my first validator I'm checking if the Zip code is xxxxx-xxxx or xxxxx format that is 5+4 or 5. In my 2nd validator I check if the last 4 are not set to 0000. This means 1234-0000 is invalid. These are my Regex and I want to be sure they are valid. Seems they test okay, however when cross checking them with the regex101 app online I'm getting different behavior than .NET.
xxxxx-xxxx or xxxxx = ^[0-9]{5}(?:-[0-9]{4})?$
xxxxx-0000 = \d{5}(?!-0000).*
This last one I quite don't understand how it works, but it seems to work. Someone help explain me the ?! and .* they both seem to need to be necessary for this to function. My understanding is the .* means all char and the ?! means negative lookahead????
Actually, the regex pattern I would suggest here is actually a combination of the two you provided above:
^[0-9]{5}(?!-0000$)(?:-[0-9]{4})?$
Demo
Here is an explanation of the pattern:
^ from the start of the ZIP code
[0-9]{5} match a 5 digit ZIP code
(?!-0000$) then assert that the PO box is NOT -0000
(?:-[0-9]{4})? match an optional -xxxx PO box (which can't be 0000)
$ end of the ZIP code
Of note, the (?!-0000$) term is called a negative lookahead, because it looks ahead in the input and asserts that what follows is not -0000. But, using a lookahead does not advance the pattern, so after completing the negative assertion, the pattern continues trying to match an optional -xxxx PO box following.

Split string in R,better to use regular expressions

I'd like to split a text string in R
for example:"cell_70001.ERP123.138_D11_62.5Y_45880"
But,I want ERP123.138_D11_62.5Y_45880 finally.
That is to say, cut the place where the first punctuation starts, get the part after it,
I really don’t understand regular expressions, but I’m very anxious. I hope someone can help me. Thank you.
Since your aim is to get where the first punctuation starts, considering _ is a word, we could do:
sub(".*?\\W","", "cell_70001.ERP123.138_D11_62.5Y_45880")
[1] "ERP123.138_D11_62.5Y_45880"
This is to say, delete everything until the first non-word character.
You could also do it as:
sub("\\w+\\W","", "cell_70001.ERP123.138_D11_62.5Y_45880")
[1] "ERP123.138_D11_62.5Y_45880"
Which means delete every word until the first non-word. Which is then also deleted

Cleaning forum post with multiple quotations in rvest + stringr

I am scraping a very long forum thread, and I want to come up with a database that has columns containing the following info: date / full post text / quoted user / quoted text / clean text
The clean text should be each user's post, without the quotations if they are replying to anyone. if the post is not a reply, I would leave it as NA. The following is an invented post, with invented user, to illustrate what I have managed to do so far:
post<-"Meow1 wrote: »\noday is gonna be the day that they're gonna throw it back to you?\nBy now you should've somehow Realized what you gotta do\n\n\nI don't believe that anybody Feels the way I do, about you now\nMeow1 wrote: »\nI'm sure you've heard it all before But you never really had a doubt\n\n\nBecause maybe, you're gonna be the one that saves me\nMeow1 wrote: »\nAnd after all, you're my wonderwall\n\n\nAnd all the lights that lead us there are blinding"
Then I try to pull out the quoted user (Meow1) and it works:
QuotedUser_1<-ifelse(grepl('wrote:', post), gsub('\\s*wrote.*$', '', post), NA)
QuotedUser_1
[1] "Meow1"
Then I created this codes for pulling out the quoted text, and the clean text:
Quotedtext_1<- ifelse(grepl('wrote:', post), gsub('^.*wrote\\s*|\\s*\\n\\n\\n.*$', '', post), NA)
It works when there is only one quoted text, but otherwise, it only gives the last quoted bit (in the example, 'And after all, you´re my wonderwall')
And same for the clean text, it only returns the last reply:
Clean_text<- sub('^.*\\n\\n\\n\\s*|\\s*wrote.*', '', post)
If anyone has a suggestion to improve the code, so that I can have a vector with all the quotations, and a vector with all the replies, I would be very grateful...
Cheers
Are you sure you cannot scrape the author and text information separately? Without a source it's difficult to know, but I guess they can be obtained by different css-selectors making it much easier to split the data.
If not, it might be helpful to look into str_locate_all which allows you to locate all occurences of e.g. "wrote:" and split the string accordingly.

R Using Regex to find a word after a pattern

I'm grabbing the following page and storing it in R with the following code:
gQuery <- getURL("https://www.google.com/#q=mcdimalds")
Within this, there's the following snippet of code
Showing results for</span> <a class="spell" href="/search?rlz=1C1CHZL_enUS743US743&q=mcdonalds&spell=1&sa=X&ved=0ahUKEwj9koqPx_TTAhUKLSYKHRWfDlYQvwUIIygA"><b><i>mcdonalds</i></b></a>
Everything other than "showing results for" and the italics tags encasing the desired name for extraction are subject to change from query to query.
What I want to do is extract the mcdonalds out of this string using regex that occurs here: <b><i>mcdonalds</i> aka the second instance of mcdonalds. However, I'm not too sure how to write the regex to do so.
Any help accomplishing this would be greatly appreciated. As always, please let me know if any additional information should be added to clarify the question.

Regex for ASP.NET url rewrite

Sample text =
legacycard.ashx?save=false&iNo=3&No=555
Sample pattern =
^legacycard.ashx(.*)No=(\d+)
Want to grab group #2 value of "555" (the value of "No=" in the sample text)
In Expresso, this works, but in ASP.NET UrlRewrite, it is not catching.
Am I missing something?
Thanks!
I would do something along these lines:
^legacycard.ashx\?(?:.+&)*No=(\d+)
The \? will escape the question mark that normally separates the URL and the parameters, then you make sure that it will capture every parameter key/value pair (anything that ends on &) before the parameter you actually care about. Using ?: lets you specify that the set of brackets is non capturing (I'm assuming you won't need any of the data, has the potential to slightly speeds up your regex) and leaves you just 555 captured. The added benefit of this approach is that it'll work regardless of parameter order.
Just use this regex:
^legacycard\.ashx\?save=(false|true)&iNo=(?<ino>\d+)&No=(?<no>\d+)
Then Regex Replace with
${no}
Looks fine to me, your regex should match the entire string
legacycard.ashx?save=false&iNo=3&No=555
not sure why you have groups, but groups should also return
?save=false&iNo=3&
and
555
For good measure you should know that the . in legacycard.ashx is also interpreted by regex and you would normally escape it, in this case it dosen't matter because a single dot matches everything, also a dot. :)
Try this
^legacycard.ashx(\?No=|.*?&No=)(\d+)
this should work.

Resources