I want to check header after add missing headers.
main.cf:
always_add_missing_headers = yes
header_checks:
# 2022-10-25 14:21:34.371985
/^Message-ID:/i PREPEND X-MY-HEADER: HAHA
# ------------------------------
/^To:.*6#test.com./ DISCARD
Related
How is it possible to join multiple lines of a log file into 1 dataframe row?
ADDED ONE LINE -- Example 4-line log file:
[WARN ][2016-12-16 13:43:10,138][ConfigManagerLoader] - [Low max memory=477102080. Java max memory=1000 MB is recommended for production use, as a minimum.]
[DEBUG][2016-05-26 10:10:22,185][DataSourceImpl] - [SELECT mr.lb_id,mr.lf_id,mr.mr_id FROM mr WHERE (( mr.cap_em >
0 AND mr.cap_em > 5
)) ORDER BY mr.lb_id, mr.lf_id, mr.mr_id]
[ERROR][2016-12-21 13:51:04,710][DWRWorkflowService] - [Update Wizard - : [DWR WFR request error:
workflow rule = BenCommonResources-getDataRecords
version = 2.0
filterValues = [{"fieldName": "wotable_hwohtable.status", "filterValue": "CLOSED"}, {"fieldName": "wotable_hwohtable.status_clearance", "filterValue": "Goods Delivered"}]
sortValues = [{"fieldName": "wotable_hwohtable.cost_actual", "sortOrder": -1}]
Result code = ruleFailed
Result message = Database error while processing request.
Result details = null
]]
[INFO ][2019-03-15 12:34:55,886][DefaultListableBeanFactory] - [Overriding bean definition for bean 'cpnreq': replacing [Generic bean: class [com.ar.moves.domain.bom.Cpnreq]; scope=prototype; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in URL [jar:file:/D:/Dev/404.jar!/com/ar/moves/moves-context.xml]] with [Generic bean: class [com.ar.bl.bom.domain.Cpnreq]; scope=prototype; abstract=false; lazyInit=false; autowireMode=0; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=null; factoryMethodName=null; initMethodName=null; destroyMethodName=null; defined in URL [jar:file:/D:/Dev/Tools/Tomcatv8.5-appGit-master/404.jar!/com/ar/bl/bom/bl-bom-context.xml]]]
(See representative 8-line extract at https://pastebin.com/bsmWWCgw.)
The structure is clean:
[PRIOR][datetime][ClassName] - [Msg]
but the message is often multi-lined, there may be multiple brackets in the message itself (even trailing…), or ^M newlines, but not necessarily… That makes it difficult to parse. Dunno where to begin here…
So, in order to process such a file, and be able to read it with something like:
#!/usr/bin/env Rscript
df <- read.table('D:/logfile.log')
we really need to have that merge of lines happening first. How is that doable?
The goal is to load the whole log file for making graphics, analysis (grepping out stuff), and eventually writing it back into a file, so -- if possible -- newlines should be kept in order to respect the original formatting.
The expected dataframe would look like:
PRIOR Datetime ClassName Msg
----- ------------------- ------------------- ----------
WARN 2016-12-16 13:43:10 ConfigManagerLoader Low max...
DEBUG 2016-05-26 10:10:22 DataSourceImpl SELECT ...
And, ideally once again, this should be doable in R directly (?), so that we can "process" a live log file (opened in write mode by the server app), "à la tail -f".
This is a pretty wicked Regex bomb. I'd recommend using the stringr package, but you could do all this with grep style functions.
library(stringr)
str <- c(
'[WARN ][2016-12-16 13:43:10,138][ConfigManagerLoader] - [Low max memory=477102080. Java max memory=1000 MB is recommended for production use, as a minimum.]
[DEBUG][2016-05-26 10:10:22,185][DataSourceImpl] - [SELECT mr.lb_id,mr.lf_id,mr.mr_id FROM mr WHERE (( mr.cap_em >
0 AND mr.cap_em > 5
)) ORDER BY mr.lb_id, mr.lf_id, mr.mr_id]
[ERROR][2016-12-21 13:51:04,710][DWRWorkflowService] - [Update Wizard - : [DWR WFR request error:
workflow rule = BenCommonResources-getDataRecords
version = 2.0
filterValues = [{"fieldName": "wotable_hwohtable.status", "filterValue": "CLOSED"}, {"fieldName": "wotable_hwohtable.status_clearance", "filterValue": "Goods Delivered"}]
sortValues = [{"fieldName": "wotable_hwohtable.cost_actual", "sortOrder": -1}]
Result code = ruleFailed
Result message = Database error while processing request.
Result details = null
]]'
)
Using regex we can split each line by checking for the pattern you mentioned. This regex is checking for a [, followed by any non-line feed character or line feed character or carriage return character, followed by a [. But do this is a lazy (non-greedy) way by using *?. Repeat that 3 times, then check for a -. Finally, check for a [, followed by any characters or a group that includes information within square brackets, then a ]. That's a mouthful. Type it into a regex calculator. Just remember to remove the extra backlashes (in a regex calculator \ is used but in R \\ is used).
# Split the text into each line without using \n or \r.
# pattern for each line is a lazy (non-greedy) [][][] - []
linesplit <- str %>%
# str_remove_all("\n") %>%
# str_extract_all('\\[(.|\\n|\\r)+\\]')
str_extract_all('\\[(.|\\n|\\r)*?\\]\\[(.|\\n|\\r)*?\\]\\[(.|\\n|\\r)*?\\] - \\[(.|\\n|\\r|(\\[(.|\\n|\\r)*?\\]))*?\\]') %>%
unlist()
linesplit # Run this to view what happened
Now that we have each line separated break them into columns. But we don't want to keep the [ or ] so we use a positive lookbehind and a positive lookahead in the regex code to check to see if the are there without capturing them. Oh, and capture everything between them of course.
# Split each line into columns
colsplit <- linesplit %>%
str_extract_all("(?<=\\[)(.|\\n|\\r)*?(?=\\])")
colsplit # Run this to view what happened
Now we have a list with an object for each line. In each object are 4 items for each column. We need to convert those 4 items to a dataframe and then join those dataframes together.
# Convert each line to a dataframe, then join the dataframes together
df <- lapply(colsplit,
function(x){
data.frame(
PRIOR = x[1],
Datetime = x[2],
ClassName = x[3],
Msg = x[4],
stringsAsFactors = FALSE
)
}
) %>%
do.call(rbind,.)
df
# PRIOR Datetime ClassName Msg
# 1 WARN 2016-12-16 13:43:10,138 ConfigManagerLoader Low max memory=
# 2 DEBUG 2016-05-26 10:10:22,185 DataSourceImpl SELECT mr.lb_id
# 3 ERROR 2016-12-21 13:51:04,710 DWRWorkflowService Update Wizard -
# Note: there are extra spaces that probably should be trimmed,
# and the dates are slightly messed up. I'll leave those for the
# questioner to fix using a mutate and the string functions.
I will leave it to you to fix the extra spaces, and date field.
I’m still new to R and regexes, but I’m trying to achieve the following; suppose I have a data table of the following sort:
Title | URL
stackoverflow.com | https://stackoverflow.com
google.com | http://
youtube.com | https://youtube.com
overclock.net | https://
I want to append the cells in column URL with their corresponding value in column Title, in case URL consists only of either http:// or https://, so the desired output would look as follows:
Title | URL
stackoverflow.com | https://stackoverflow.com
google.com | http://google.com
youtube.com | https://youtube.com
overclock.net | https://overclock.net
To do so, I tried using the sub function in conjunction with a lookahead regex as follows:
dt$URL <- sub("(?:^|\\W)https?://(?:$|\\W)", "\\1", dt$Title, perl = TRUE)
or
dt$URL <- sub("\\s(https?://)", "\\1", dt$Title, perl = TRUE)
or
dt$URL <- sub("\\b(https?://\\b)", "\\1", dt$Title, perl = TRUE)
But none of the above produces the desired output. The issue is that it doesn’t append/replace anything at all, possibly because the regex doesn’t match anything, or it also matches if there is more data than just http:// or https:// present, i.e. it will also match on a full domain name (which I do not want). How should I adjust my code so that it produces the desired output, given the example input above?
Thank you!
url.col <- c("https://stackoverflow.com",
"http://",
"https://youtube.com",
"https://")
title.col <- c("stackoverflow.com",
"google.com",
"youtube.com",
"overclock.net")
ifelse(grepl("^(\\w*http(s)?://)$", url.col), # if pattern matches url.col elem:
paste0(url.col, title.col), # join content of cols together and return!
url.col) # but if not return url.col element 'as is'
[1] "https://stackoverflow.com"
[2] "http://google.com"
[3] "https://youtube.com"
[4] "https://overclock.net"
I have a list of files that do have the identical filename but are in different subfolders. The values in the files are seperated with a tab.
I would like to attach to all of the files "test.txt" an additional first column with the foldername and if merge to one file in the end (they all have the same header for the columns).
The most important command though would be the merging.
I have tried to many commands now that did not work so I guess I am missing an essential step with awk...
Current structure is:
mainfolder
|_>Folder1
|_>test.txt
|->Folder2
|_>test.txt
.
.
.
This is where I would like to get to per file before merging all of the,
#Name Count FragCount Type Left LeftB Right RightB Support FRPM LeftBD LeftBE RightBD RightBE annots
RFP1A 13 10 REF RFP1A_ins chr3:3124352:+ RFP1A_ins chr3:5234143:+ confirmed 0.86 TA 1.454 AC 1.564 ["INTRACHROM."]
#Samplename #Name Count FragCount Type Left LeftB Right RightB Support FRPM LeftBD LeftBE RightBD RightBE annots
Sample1 RFP1A 13 10 REF RFP1A_ins chr3:3124352:+ RFP1A_ins chr3:5234143:+ confirmed 0.86 TA 1.454 AC 1.564 ["INTRACHROM."]
Thanks so much!!
D
I believe this might do the trick:
$ cd mainfolder
$ awk '(NR==1){sub("#","#Samplename\t"); print} # print header
(FNR==1){next} # skip header
{print substr(FILENAME,1,match(FILENAME,"/")-1)"\t"$0 } # add directory
' */test.txt > /path/to/newfile.txt
I am trying to use Pyparsing to identify a keyword which is not beginning with $ So for the following input:
$abc = 5 # is not a valid one
abc123 = 10 # is valid one
abc$ = 23 # is a valid one
I tried the following
var = Word(printables, excludeChars='$')
var.parseString('$abc')
But this doesn't allow any $ in var. How can I specify all printable characters other than $ in the first character position? Any help will be appreciated.
Thanks
Abhijit
You can use the method I used to define "all characters except X" before I added the excludeChars parameter to the Word class:
NOT_DOLLAR_SIGN = ''.join(c for c in printables if c != '$')
keyword_not_starting_with_dollar = Word(NOT_DOLLAR_SIGN, printables)
This should be a bit more efficient than building up with a Combine and a NotAny. But this will match almost anything, integers, words, valid identifiers, invalid identifiers, so I'm skeptical of the value of this kind of expression in your parser.
I have files (~1k) that look (basically) like this:
NAME1.txt
NAME ATTR VALUE
NAME1 x 1
NAME1 y 2
...
NAME2.txt
NAME ATTR VALUE
NAME2 x 19
NAME2 y 23
...
Where the ATTR collumn is same in everyfile and the name column is just some version of the filename. I would like to combine them together into 1 file that looks like:
All_data.txt
ATTR NAME1_VALUE NAME2_VALUE NAME3_VALUE ...
X 1 19 ...
y 2 23 ...
...
Is there simple way to do this with just command line utilities or will I have to resort to writing some script?
Thanks
You need to write a script.
gawk is the obvious candidate
You could create an associative array in a BEGIN block, using FILENAME as the KEY and
ATTR " " VALUE
values as the value.
Then create your output in an END block.
gawk can process all txt files together by using *txt as the filename
It's a bit optimistic to expect there to be a ready made command to do exactly what you want.
Very few command join data horizontally.