extract a string after a pattern - unix

I want to extract the numbers following client_id and id and pair up client_id and id in each line.
For example, for the following lines of log,
User(client_id:03)) results:[RelatedUser(id:204, weight:10),_RelatedUser(id:491,_weight:10),_RelatedUser(id:29, weight: 20)
User(client_id:04)) results:[RelatedUser(id:209, weight:10),_RelatedUser(id:301,_weight:10)
User(client_id:05)) results:[RelatedUser(id:20, weight: 10)
I want to output
03 204
03 491
03 29
04 209
04 301
05 20
I know I need to use sed or awk. But I do not know exactly how.
Thanks

This may work for you:
awk -F "[):,]" '{ for (i=2; i<=NF; i++) if ($i ~ /id/) print $2, $(i+1) }' file
Results:
03 204
03 491
03 29
04 209
04 301
05 20

Here's a awk script that works (I put it on multiple lines and made it a bit more verbose so you can see what's going on):
#!/bin/bash
awk 'BEGIN{FS="[\(\):,]"}
/client_id/ {
cid="no_client_id"
for (i=1; i<NF; i++) {
if ($i == "client_id") {
cid = $(i+1)
} else if ($i == "id") {
id = $(i+1);
print cid OFS id;
}
}
}' input_file_name
Output:
03 204
03 491
03 29
04 209
04 301
05 20
Explanation:
awk 'BEGIN{FS="[\(\):,]"}: invoke awk, use ( ) : and , as delimiters to separate your fields
/client_id/ {: Only do the following for the lines that contain client_id:
for (i=1; i<NF; i++) {: iterate through the fields on each line one field at a time
if ($i == "client_id") { cid = $(i+1) }: if the field we are currently on is client_id, then its value is the next field in order.
else if ($i == "id") { id = $(i+1); print cid OFS id;}: otherwise if the field we are currently on is id, then print the client_id : id pair onto stdout
input_file_name: supply the name of your input file as first argument to the awk script.

This might work for you (GNU sed):
sed -r '/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d;s//\2 \3\n\1/;P;D' file
/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d if the line doesn't have the intended strings delete it.
s//\2 \3\n\1/ re-arrange the line by copying the client_id and moving the first id ahead thus reducing the line for successive iterations.
P print upto the introduced newline.
D delete upto the introduced newline.

I would prefer awk for this, but if you were wondering how to do this with sed, here's one way that works with GNU sed.
parse.sed
/client_id/ {
:a
s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/
ta
s/^[^\n]+\n//
}
Run it like this:
sed -rf parse.sed infile
Or as a one-liner:
<infile sed '/client_id/ { :a; s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/; ta; s/^[^\n]+\n//; }'
Output:
03 204
03 491
03 29
04 209
04 301
05 20
Explanation:
The idea is to repeatedly match client_id:([0-9]+) and id:([0-9]+) pairs and put them at the end of pattern space. On each pass the id:([0-9]+) is removed.
The final replace removes left-overs from the loop.

Related

Awk program to compare number of fields by space of each line

I am trying to check if each line has a same length(or number of fields) in a file.
I am doing the following but it seems not to work.
NR==1 {length=NF}
NR>1 && NF!=length {print}
Can this be done by a one-liner awk? or a program is fine.
A sample of input would be:
12 34 54 56
12 89 34 33
12
29 56 42 42
My expected output would be "yes" or "no" if they have the same number of fields or not.
You could try this command which checks the number of fields in each line and compares it to the number of fields of the first line:
awk 'NR==1{a=NF; b=0} (NR>1 && NF!=a){print "No"; b=1; exit 1}END{if (b==0) print "Yes"}' test.txt
Checking is aborted in the first line whose number of fields is distinct from the first line of input.
For input
12 43 43
12 32
you will get "No"
Try:
awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file
How it works
BEGIN{a="yes"}
This initializes the variable a to yes. (We assume all lines have the same number fields until proven otherwise.)
last!="" && NF!=last{a="no"; exit}
If last has been assigned a value and the number of fields on the current line is not the same as last, then set a to no and exit.
{last=NF}
Update last to the number of fields on the current line.
END{print a}
Before exiting, print a.
Examples
$ cat file1
2 34 54 56
12 89 34 33
12
29 56 42 42
$ awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file1
no
$ cat file2
2 34 54 56
12 89 34 33
29 56 42 42
$ awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file2
yes
I am assuming that you want to check fields of all lines, if they are equal or not if this is case then try following.
awk '
FNR==1{
value=NF
count++
next
}
{
count=NF==value?++count:count
}
END{
if(count==FNR){
print "All lines are of same fields"
}
else{
print "All lines are NOT of same fields."
}
}
' Input_file
Additional stuff(only if require): In case you want to print contents of file whose all lines are having same fields along with yes or all are same fields in file message in output then try following.
awk '
{
val=val?val ORS $0:$0
}
FNR==1{
value=NF
count++
next
}
{
count=NF==value?++count:count
}
END{
if(count==FNR){
print "All lines are of same fields" ORS val
}
else{
print "All lines are NOT of same fields."
}
}
' Input_file
this should do
$ awk 'NR==1{p=NF} p!=NF{s=1; exit} END{print s?"No":"Yes"}' file
however, setting the exit status would be better if this will be part of a workflow.
Since equivalence has transitive property, there is no need to keep NF other than the first line; setting 0 as your success value doesn't require initialization to default value.
An efficient even fields shell function, using sed to construct a regex, (based on the first line of input), to feed to GNU grep, which looks for field length mismatches:
# Usage: ef filename
ef() { sed '1s/[^ ]*/[^ ]*/g;q' "$1" | grep -v -m 1 -q -f - "$1" \
&& echo no || echo yes ; }
For files with uneven fields grep -m 1 quits after the first non-uniform line -- so if the file is a million lines long, but the mismatch occurs on line #2, grep only needs to read two lines, not a million. On the other hand, if there's no mismatch grep would have to read a million lines.

Drop 4 first columns

I have a command that can drop first 4 columns, but unfortunately if 2nd column name and 4th column name likely similar, it will truncate at 2nd column but if 2nd column and 4th column name are not same it will truncate at 4th column. Is it anything wrong to my commands?
**
awk -F"|" 'NR==1 {h=substr($0, index($0,$5)); next}
{file= path ""$1""$2"_"$3"_"$4"_03042017.csv"; print (a[file]++?"": "DETAILS 03042017" ORS h ORS) substr($0, index($0,$5)) > file}
END{for(file in a) print "EOF " a[file] > file}' filename
**
Input:
Account Num | Name | Card_Holder_Premium | Card_Holder| Type_Card | Balance | Date_Register
01 | 02 | 03 | 04 | 05 | 06 | 07
Output
_Premium | Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
My desired output:
Card_Holder| Type_Card | Balance | Date_Register
05 | 06 |07
Is this all you're trying to do?
$ sed -E 's/([^|]+\| ){4}//' file
April | May | June
05 | 06 | 07
$ awk '{sub(/([^|]+\| ){4}/,"")}1' file
April | May | June
05 | 06 | 07
The method you use to remove columns using index is not correct. As you have figured out, index can be confused and match the previous field when the previous field contains the same words as the next field.
The correct way is the one advised by Ed Morton.
In this online test, bellow code based on Ed Morton suggestion, gives you the output you expect:
awk -F"|" 'NR==1 {sub(/([^|]+\|){3}/,"");h=$0;next} \
{file=$1$2"_"$3"_"$4"_03042017.csv"; sub(/([^|]+\|){3}/,""); \
print (a[file]++?"": "DETAILS 03042017" ORS h ORS) $0 > file} \
END{for(file in a) print "EOF " a[file] > file}' file1.csv
#Output
DETAILS 03042017
Card_Holder| Type_Card | Balance | Date_Register
04 | 05 | 06 | 07
EOF 1
Due to the whitespace that you have include in your fields, the filename of the generated file appears as 01 02 _ 03 _ 04 _03042017.csv. With your real data this filename should appear correct.
In any case, i just adapt Ed Morton answer to your code. If you are happy with this solution you should accept Ed Morton answer.
PS: I just removed a space from Ed Morton answer since it seems to work a bit better with your not so clear data.
Ed Suggested:
awk '{sub(/([^|]+\| ){4}/,"")}1' file
#Mind this space ^
This space here it might fail to catch your data if there is no space after each field (i.e April|May).
On the other hand, by removing this space it seems that Ed Solution can correctly match either fields in format April | May or in format April|May

List only hex value named files

Using QNX, i'm creating a script that will list only hex valued files under 1F.
/path# ls
. 05 09 0B pubsub09
.. 07 09_sub 0E
04 08 0A 81
/path#
I have code that should list only hex values, but it still lists the whole directory.
ls /path/ |
while read fname
do
if [ "ibase=16; $fname" ]
then
echo "$fname"
fi
done
return 0
Try this instead
if [[ $fname =~ ^[[:xdigit:]]+$ ]]

second column of a file equals some variables

If the second column of a file equals one of the number below :
65,81,83,97,99,113,129,145,147,161,163,177 #To be read 65 OR 81 OR 83 OR 97 OR 99 etc..
then I need to print the whole line in an output file, so typed the following:
samtools view myfile.bam | awk '{for (i=65,81,83,97,99,113,129,145,147,161,163,177) if ($2==i) ; print$0} > output.bam
Would that work?
You can build a dict which contains a list of numbers in the BEGIN block.
Then use dict as a filter.
awk '
BEGIN {
dict[65]
dict[81]
# skip
dict[177]
}
$2 in dict' file.txt
If you have a long list, rather than having a lot of explicit assignments:
awk 'BEGIN {
numlist = "65,81,83,97,99,113,129,145,147,161,163,177"
split(numlist, a, ",")
for (i in numlist) {
nums[a[i]]
}
}
$2 in nums' inputfile

Awk script for substracting from the above field

Hi I have my input file with one field:
30
58
266
274
296
322
331
I need the output to be the difference of 2nd and 1st rows(58-30=28) and 3rd and 2nd rows(266-58=208) and so on.
my output should look like below:
30 30
58 28
266 208
274 8
any help please?
data=`cat file | xargs`
echo $data | awk '{a=0; for(i=1; i<=NF;i++) { print $i, $i-a; a=$i}}'
30 30
58 28
266 208
274 8
296 22
322 26
331 9
Update upon comment Without cat/xargs:
awk '{printf "%d %d\n", $1, $1-a; a=$1;}' file
You don't actually need the for loop from Khachick's answer as Awk will go through all the rows anyway. Simpler is:
cat file | awk '{ BEGIN { a=0 }; { print $1, $1-a; a=$1 }'
However it is also possible to skip the first row that you don't really want by initialising a variable in the BEGIN block and not doing the print if the variable is so initialised before changing its value. Sort of like:
BEGIN { started=0 }; { if(0 == started) { started = 1 } else { print $1, $1-a } }

Resources