How do i deal with pattern not matched in fluentd - airflow

could someone please help to fix the below problem as it says pattern not matched
2022-07-13 12:25:19 +1000 [warn]: #0 [dag-processor-manager.log] pattern not matched: "================================================================================\n[2022-07-13 12:25:19,814] {manager.py:803} INFO - "

Related

Gsub error in R---regex expression rejected--not able to fix bad Json [duplicate]

This question already has answers here:
Error: '\R' is an unrecognized escape in character string starting "C:\R"
(5 answers)
gsub, lookahead and lookbehind
(2 answers)
Closed 14 days ago.
I'm trying to fix some poorly formatted json in R that has single quotes rather than double quotes. The file also has valid single quotes within the document, so I can't simply replace them all with double.
I've found a regex expression that should do what I want: This expression fixes the single quote issue, while leaving valid single quotes used as punctuation in text intact.
((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))
The source of the pattern is here: Replace single quote with double quote with Regex.
Here's some sample data and the expected output of the replacement of " for ' in the proper contexts. Note that the possessive near Sultan needs to be preserved.
{'comment': "Every evening, the brave queen of Persia, Shahrazad, goes into the Sultan's rooms and begins a weave of words, hoping to entice the Sultan to let her live another night so she can continue her story. ", 'nhelpful': 0, 'unixtime': 1332288000, 'work': '73960', 'flags': [], 'user': 'Elizabeth.Wong98', 'stars': 4.5, 'time': 'Mar 21, 2012'}
{"comment": "Every evening, the brave queen of Persia, Shahrazad, goes into the Sultan's rooms and begins a weave of words, hoping to entice the Sultan to let her live another night so she can continue her story. ","nhelpful": 0,"unixtime": 1332288000,"work":"73960","flags": [],"user":"Elizabeth.Wong98","stars": 4.5,"time":"Mar 21, 2012"}
For some reason that I can't determine, this regex pattern is rejected in R's gsub function, but accepted in other applications. Can anyone help? Here is my failing script with gsub:
badjson<-"{'comment': \"Every evening, the brave queen of Persia, Shahrazad, goes into the Sultan's rooms and begins a weave of words, hoping to entice the Sultan to let her live another night so she can continue her story. \", 'nhelpful': 0, 'unixtime': 1332288000, 'work': '73960', 'flags': [], 'user': 'Elizabeth.Wong98', 'stars': 4.5, 'time': 'Mar 21, 2012'}"
gsub("((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))","\"",badjson)
I am getting the following error.
Error: '\s' is an unrecognized escape in character string starting ""((?<={)\s"
Also, switching out \s for double-slash s does not solve the issue.

US Zip+4 Validation

I have a req to provide US Zip+4 with the +4 being optional and the +4 can't be 0000. I'm doing this in .NET therefore I'm using RegularExpressionValidator with RegEx set. In my first validator I'm checking if the Zip code is xxxxx-xxxx or xxxxx format that is 5+4 or 5. In my 2nd validator I check if the last 4 are not set to 0000. This means 1234-0000 is invalid. These are my Regex and I want to be sure they are valid. Seems they test okay, however when cross checking them with the regex101 app online I'm getting different behavior than .NET.
xxxxx-xxxx or xxxxx = ^[0-9]{5}(?:-[0-9]{4})?$
xxxxx-0000 = \d{5}(?!-0000).*
This last one I quite don't understand how it works, but it seems to work. Someone help explain me the ?! and .* they both seem to need to be necessary for this to function. My understanding is the .* means all char and the ?! means negative lookahead????
Actually, the regex pattern I would suggest here is actually a combination of the two you provided above:
^[0-9]{5}(?!-0000$)(?:-[0-9]{4})?$
Demo
Here is an explanation of the pattern:
^ from the start of the ZIP code
[0-9]{5} match a 5 digit ZIP code
(?!-0000$) then assert that the PO box is NOT -0000
(?:-[0-9]{4})? match an optional -xxxx PO box (which can't be 0000)
$ end of the ZIP code
Of note, the (?!-0000$) term is called a negative lookahead, because it looks ahead in the input and asserts that what follows is not -0000. But, using a lookahead does not advance the pattern, so after completing the negative assertion, the pattern continues trying to match an optional -xxxx PO box following.

Trying to do a multiple grep in R with regular expressions

I have to look for some registers with a regular expression in a massive log file with R.
All the data is about 1 Gb and i have to look for some registries and add the matches to a dataframe.
What I am doing is the following:
grep(paste("(Sending+.+message+.+1234567890+.+5000)", sep=""), dfLogs)
This goes correct when I do it for only one register.
When I try to do the grep for all the searches:
dfTrx$RcvMessage <- paste("(Sending+.+message+.+", dfTrx$NUMBER, "+.+", dfTrx$AMOUNT,")", sep="")
dfReceived <- unique(grep(paste(dfTrx$RcvMessage, collapse="|"), dfLogs), value=TRUE)
And I get the following error:
Error in grep(paste(dfTrx$RcvMessage, collapse = "|"), dfLogs) :
invalid regular expression '(Sending+.+message+.+1234567890+.+20)|(Sending+.+message+.+9876543210+.+20)|...
How can I do this regular expression for all the values? What am I doing wrong?
An example for data:
2015-12-09 19:01:44,717 - [DEBUG] [pool-1-thread-4450 ] [OutputHandler ] Sending 8 message(s) : 01XX765903091220151901440XXXX0000129D3A00003996101901442015120903857655184776733438000000200001XX765904091220151901440XXXX0000118BC100001839671901442015120903857655194251212137000000300001XX765905091220151901440XXXX000010E52A00003311451901442015120903857655203331836622000000200001XX765906091220151901440XXXX000011DCD300001972561901442015120903857655215522476419000000300001XX765907091220151901440XXXX000012980900003923951901442015120903857655225531194531000000500001XX765908091220151901440XXXX000010ED2200003882461901442015120903857655237351043626000000200001XX765909091220151901440XXXX000011BDBE00001656451901442015120903857655243312669477000000200001XX765910091220151901440XXXX00001211F3000024385819014420151209038576552598211945310000002000
I need to find the sending of the message, and the number and amount in the content of the message.
Thanks at #Marek.
I found that in the df there were some bad values that were giving me NA and some spaces that I hadn't seen before. Seems I didn't clean all the data properly. Thanks and sorry for a silly mistake from me.

Please explain the below unix code

echo "1,a,20,000,aa,s" | sed 's/,\([^0]\)/|\1/g'
**output
1|a|20,000|aa|s**
Please explain the above command.
I am unable to understand this execution.
The given command uses sed to substitute certain characters for other characters.
The basic form for this is
s/FIND/REPLACE/
where FIND and REPLACE are regular expressions.
The g at the end stands for global. It means that not only the first occurrence of a pattern matching FIND is replaced but all occurrences in the input string.
To the regular expressions used:
FIND ,\([^0]\) This pattern matches all two character strings who start with a , which is not followed by a 0.
REPLACE |\1 This is equal to a two character string who starts with a | which is followed by the second character in FIND. (The \1 remembers the previously found match)
For a detailed overview of the sed commands I suggest you also read here: http://www.grymoire.com/Unix/Sed.html#uh-1
And to look up on how to read regular expressions: http://www.grymoire.com/Unix/Regular.html
Of curse there are many more sites concerning this to be found if the above web-pages are not enlightening to you.

Url re-writing regular expression

Using ASP.NET, and trying to implement friendly url re-writing but I'm having a heck of a time working with the regular expression. Basically, I'm checking the url directly following the domain to see whether it is using the french-canadian culture, or whether it is a number - and not a number followed by characters. I need to catch anything that begins with 'fr-ca' OR a number, and both scenarios can have a trailing '/' (forward slash), but that's all...
fr-ca - GOOD
fr-ca/ - GOOD
fr-ca/625 - GOOD
fr-ca/gd - BAD
fr-ca43/ - BAD
1234 - GOOD
1234/ - GOOD
1234/g - GOOD
1234g - BAD
1g34 - BAD
This is what I've come up with : ^(fr-ca)?/?([0-9]+)?
But it doesn't seem to be working the way I want.. so I started fresh and came up with (^fr-ca)|(^[0-9]), which still isn't working the way I want. Please...HELP!
no idea about Asp.net, but the regexp was tested with grep. you could try in your .net box:
kent$ cat a
fr-ca
fr-ca/
fr-ca/625
fr-ca/gd
fr-ca43/
1234
1234/
1234/g
1234g
1g34
updated
kent$ grep -P "(^fr-ca$|^\d+$|^(fr-ca|\d+)/(\w|\d*)$)" a
fr-ca
fr-ca/
fr-ca/625
1234
1234/
1234/g
--
well this may not be the best regex, but would be the most straightforward one.
it matchs string
^fr-ca$
or ^\d+$
or ^(fr-ca|\d+)/(\w|\d*)$
the above line can be broken down as well
^(fr-ca|\d+)/(\w|\d*)$ :
starting with fr-ca or \d+
then comes "/"
after "/" we expect \w or \d*
then $(end)
What about
^(fr-ca(?:\/\d*)?|[0-9]+(\/[a-zA-Z]*)?)$
See it here on Regexr, it matches all your good cases and not the bad ones.
Probably can try...
^(fr-ca|\d).*$
But of course this a regex to match the entire string (as it has the end-of-sting $ anchor). Are you wanting to pull out multiple matches?
In light of re-reading the post :)
^(fr-ca|\d+)(\/\d+|\/)?$
^(\d+(\/\w*)?)$|^(fr-ca\/\d+)$|^(fr-ca\/?)$
This worked for me for all of your examples. I'm not sure your intention for using this regex so I don't know if it is capturing exactly what you want to capture.

Resources