Unix - sed get value from a line after a first colon [duplicate] - unix

This question already has answers here:
Extract one word after a specific word on the same line [duplicate]
(4 answers)
Closed 6 years ago.
I have a file (newline.txt) that contains the following line
Footer - Count: 00034300, Facility: TRACE, File Created: 20160506155539
I am trying to get the value after Count: up to the comma (in the example 00034300) from this line.
I tried this but I get is all the numbers concatenated into one large string with that command:
grep -i "Count:" newfile.txt | sed 's/[^0-9]//g'
output:0003430020160506155539
how do I get just the digits after Count: up to to the first non-digit character?
I just need 00034300.

Using sed
$ sed '/[Cc]ount/ s/[^:]*: *//; s/,.*//' newline.txt
00034300
How it works:
/[Cc]ount/ selects lines containing Count or count. This eliminates the need for grep.
s/[^:]*: *// removes everything up to the first colon including any spaces after the colon.
In what remains, s/,.*// removes everything after the first comma.
Using awk
$ awk -F'[[:blank:],]' '/[Cc]ount/ {print $4}' newline.txt
00034300
How it works:
-F'[[:blank:],]' tells awk to treat spaces, tabs, and commas as field separators.
/[Cc]ount/ selects lines that contain Count or count.
print $4 prints the fourth field on the selected lines.
Using grep
$ grep -oiP '(?<=Count: )[[:digit:]]+' newline.txt
00034300
This looks for any numbers following Count: and prints them.

Related

Unix substitute multiple strings using a reference file [duplicate]

This question already has answers here:
Match exact word with awk on Mac OS X
(4 answers)
Closed 11 months ago.
I have a reference file and using that I want to replace multiple files in a directory. I am using AWK GSUB for that, however it is not replacing exact word, but replacing all occurrences. How can I stop that behaviour? How can I replace just the word? in this case the word is "IT"
My reference file
$ cat dev_to_prod.config
nonprod_DATA_PATH PROD_DATA_PATH
nonprod_ENCRYPTKEY PROD_ENCRYPTKEY
IT Business
My current data file
$ cat filefile.txt
IT
WITH
/IT/DFGh/erfe
/WITH/IT/sjfgh/hjIT/dfdsf/ITvjkl
Output with current code
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub(i,A[i])}1' dev_to_prod.config file.txt
Business
WBusinessH
/Business/DFGh/erfe
/WBusinessH/Business/sjfgh/hjBusiness/dfdsf/Businessvjkl
man awk says:
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
Then would you please try:
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub("\\<"i"\\>",A[i])}1' dev_to_prod.config file.txt
Output:
Business
WITH
/Business/DFGh/erfe
/WITH/Business/sjfgh/hjIT/dfdsf/ITvjkl

Unix: multi and single character delimiter in cut or awk commands

This is the string I have:
my_file1.txt-myfile2.txt_my_file3.txt
I want to remove all the characters after the first "_" that follows the first ".txt".
From the above example, I want the output to be my_file1.txt-myfile2.txt. I have to search for first occurrence of ".txt" and continue parsing until I find the underscore character, and remove everything from there on.
Is it possible to do it in sed/awk/cut etc commands?
You can't do this job with cut but you can with sed and awk:
$ sed 's/\.txt/\n/g; s/\([^\n]*\n[^_]*\)_.*/\1/; s/\n/.txt/g' file
my_file1.txt-myfile2.txt
$ awk 'match($0,/\.txt[^_]*_/){print substr($0,1,RSTART+RLENGTH-2)}' file
my_file1.txt-myfile2.txt
Could you please try following, written based on your shown samples.
awk '{sub(/\.txt_.*/,".txt")} 1' Input_file
Simply substituting everything from .txt_ to till last of line with .txt and printing the line here

Remove new line character after a specific word ignore space and tab

I am parsing a sql script using unix. If FROM is a first word then merge it with previous line. If FROM is last word in line then we need to merge it with next line. E.g.:
A
FROM
B
I want the result as
A FROM B
avoid any space and tabs.
Code:
cat A.txt | sed ':a;N;$!ba;s|[Ff][Rr][Oo][Mm][\s\t]*\n|FROM |g;s/\n\s*\t*[Ff][Rr][Oo][Mm]/ FROM/g' >B.txt
Here is one using GNU awk and gensub. It replaces combination of spaces, newlines and tabs (carriage return omitted due to unix tag) before and after the word FROM. It uses empty RS as record separator, meaning that a record ends in an empty line or the end of file.
$ awk 'BEGIN{RS=""}{$0=gensub(/[ \t\n]+(FROM)[ \t\n]+/," \\1 ","g")}1' file
A FROM B
If you just want the word that comes after FROM:
$ awk 'BEGIN{RS=""}{for(i=1;i<=NF;i++)if($i=="FROM")print $(i+1)}' file
B
Both will fail if your query has FROM in the WHERE part values, like:
SELECT * FROM table WHERE variable='DEATH COMES FROM ABOVE';

How to check for special characters in UNIX? [duplicate]

This question already has answers here:
Regex, every non-alphanumeric character except white space or colon
(11 answers)
Closed 3 years ago.
I have a fixed width file with 10-15 columns. The file contains alphanumeric values. How do I check for any special characters (like !,#,#,$,% etc.) in the entire file in UNIX ?
try this;
grep -vn "^[a-zA-Z0-9]*$" yourFile
or
grep -vn "^[[:alnum:]]*$" yourFile
man grep :
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-n, --line-number
Prefix each line of output with the 1-based line number within its input file. (-n is specified by POSIX.)
[[:alnum:]] means the character class of numbers and letters in the
current locale

How to remove all lines starting with Timestamp in unix

I have a text file which is basically a log file. In that there are logs which starts with Timestamp and LogID in the format -
timestamp=2014-08-18 23:59:48.315|logId=22fef71f-979a-46aa-81b5-432d34130c34| ( followed by some text )
timestamp=2014-08-18 22:59:48.315|logId=22fef71f-979b-46aa-81b5-432d34130htf| ( followed by some text )
I need to get rid of the timestamp and get the rest of the part.
How to use "sed" command in such case.
Use cut:
cut -f 2- -d \| file
-f 2- matches everything from 2nd field to the end of the line.
-d \| sets | as field separator.
Using sed:
sed 's#^[^|]*|##' file
[^|] matches anything that's not |
Output:
logId=22fef71f-979a-46aa-81b5-432d34130c34| ( followed by some text )
logId=22fef71f-979b-46aa-81b5-432d34130htf| ( followed by some text )
When you've got fields delimited by a single character ('|' in this case), cut is generally the way to go, as in konsolebox's answer. If the delimiter is not necessarily a single character (for example, any amount of white space), then awk is probably the answer.
However, since you asked specifically about sed, this will work:
sed 's/^[^|]*|//'
It substitutes (s) text starting at the beginning of the line (^) and consisting of any number of non-pipes ([^|]*) followed by a single pipe (|), replacing it with nothing (the nothing between the //).

Resources