I have data in below format stored in a file.
ABC:9804
{
"count" : 492,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
ABC:95023
{
"count" : 865,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
ABCC:128
{
"count" : 479,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
I am trying to get the output like
ABC:9804 , 492
ABC:95023 , 865
ABCC:128 , 479
I tried using awk to get the 1st like and 3rd line but that is not working .
awk solution:
awk '/^ABC.*:/{ abc=$0 }$0~/"count"/{ gsub(/[^0-9]+/,"",$0); print abc" , "$0 }' file
The output:
ABC:9804 , 492
ABC:95023 , 865
ABCC:128 , 479
Input
$ cat infile
ABC:9804
{
"count" : 492,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
ABC:95023
{
"count" : 865,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
ABCC:128
{
"count" : 479,
"_shards" : {
"total" : 19,
"successful" : 19,
"failed" : 0
}
}
Bye
Output
Gawk
$ awk -F':' -v RS='[{},\n]' '/ABC.*|\"count"/{gsub(/[\n\t ]+/,""); printf /\"/? ", " $2 "\n": $0}' infile
ABC:9804, 492
ABC:95023, 865
ABCC:128, 479
Better Readable
awk -F':' -v RS='[{},\n]' '
/ABC.*|\"count"/{
gsub(/[\n\t ]+/,"");
printf /\"/? ", " $2 "\n": $0
}
' infile
non-gawk
$ awk -F'[ ,]' -v OFS=", " '/^[ \t]+(ABC.*|\"count\"[ ]?):/{ sub(/^[ \t]+/,""); printf /\"/ ? OFS $(NF-1) RS: $0 }' infile
ABC:9804, 492
ABC:95023, 865
ABCC:128, 479
Better Readable
awk -F'[ ,]' -v OFS=", " '
/^[ \t]+(ABC.*|\"count\"[ ]?):/{
sub(/^[ \t]+/,"");
printf /\"/ ? OFS $(NF-1) RS: $0
}
' infile
Related
I have a large dump of data in json that looks like:
[{
"recordList" : {
"record" : [{
"Production" : {
"creator" : {
"name" : "A"
}
}
},
{
"Production" : {}
},
{
"Production" : [{
"creator" : {
"name" : "B"
},
"creator" : {
"name" : "C"
}
}]
}]
}
}]
I need to check if there is at least one creator in a record or not. If there is I give a 1 else a 0 for that field in a CSV-file.
My code:
jq -r '.[].recordList.record[]|"\(if ((.Production.creator.name)? // (.Production[]?.creator.name)?) == null or ((.Production.creator.name)?|length // (.Production[]?.creator.name)?|length) == 0 then 0 else 1 end),"' file.json
The problem is that the field 'Production' is only an array when there are multiple creators.
The result I want to get in this case is:
1,
0,
1,
jq solution:
jq -r '.[].recordList.record[].Production
| "\(if ((type == "array" and .[0].creator.name !="")
or (type == "object" and .creator.name and .creator.name !=""))
then 1 else 0 end),"' file.json
The output:
1,
0,
1,
Simplified jq solution:
jq -r '.[].recordList.record[].Production
| ((type == "array" and .[0].creator.name) or .creator.name)
| if . then "1," else "0," end' file.json
How jq filter combines the filter outputs? Following jq not generates output.json with respective input arg value ('jack').
input.json
{
"key1": "",
"key2": ""
}
jq --arg input "$username" \
'if .key1 == "<value1>"
then . + {"key1" : ($input) }
else . end' input.json |
'if .key2 == "<value2>"
then . + {"key2" : ($input) }
else . end' > output.json
output.json
{
"key1": "jack",
"key2": "jack"
}
The filter you are evidently trying to write is:
if .key1 == "" then . + {"key1" : $input } else . end
| if .key2 == "" then . + {"key2" : $input } else . end
This can be simplified to:
if .key1 == "" then .key1 = $input else . end
| if .key2 == "" then .key2 = $input else . end
You might also like to consider the following approach:
def update(f): f |= (if . == "" then $input else . end);
update(.key1) | update(.key2)
I was unable to get the erlang function running so trying with Javascript as below :
curl -XPOST http://localhost:8098/mapred \
-H "Content-Type: application/json" \
-d #- \
<<EOF
{
"inputs":"logs",
"query":[{
"map":{
"language":"javascript",
"source":"function(riakObject, keydata, arg) {
var m = riakObject.values[0].data.match(/^INFO.*cart/);
return [(m ? m.length : 0 )];
}"
},
"reduce":{
"language":"javascript",
"source":"function(values, arg){
return [values.reduce(
function(total, v){ return total + v; }, 0)
];
}"
}
}]
}
EOF
seems not working with JS as well. Shell just hangs and does not return at all. Please suggest.
** UPDATE **
Today when i tried i see the following error :
An error occurred parsing the "query" field.
["Unrecognized format of query phase:\n ",
[123,
[34,<<"map">>,34],
58,
[123,
[34,<<"language">>,34],
58,
[34,<<"javascript">>,34],
44,
[34,<<"source">>,34],
58,
[34,
<<"function(riakObject, keydata, arg) { var m = riakObject.values[0].data.match(/^INFO.*Milk/); return [(m ? m.length : 0 )]; }">>,
34],
125],
44,
[34,<<"reduce">>,34],
58,
[123,
[34,<<"language">>,34],
58,
[34,<<"javascript">>,34],
44,
[34,<<"source">>,34],
58,
[34,
<<"function(values, arg){ return [values.reduce( function(total, v){ return total + v; }, 0) ]; }">>,
34],
125],
125],
"\n\nValid formats are:\n {\"map\":{...spec...}}\n {\"reduce\":{...spec...}}\n {\"link:{...spec}}\n"]
The query element should be a list of objects, with "map" and "reduce" being in discreet objects. Your JSON has them as properties of the same object.
This worked for me:
curl -XPOST http://localhost:8098/mapred -H "Content-Type: application/json" -d '{
"inputs":"logs",
"query":[
{"map":{
"language":"javascript",
"source":"function(riakObject, keydata, arg) {
var m = riakObject.values[0].data.match(/^INFO.*cart/);
return [(m ? m.length : 0 )];
}"
}},
{"reduce":{
"language":"javascript",
"source":"function(values, arg){
return [values.reduce(
function(total, v){ return total + v; }, 0)
];
}"
}}
]
}'
I have a data set which looks like this
<SUBBEGIN
IMSI=xxxxxxxxxxxx;
MSISDN=xxxxxxxxx;
DEFCALL=TS11;
CURRENTNAM=BOTH;
CAT=COMMON;
VOLTE_TAG=NOT_DEFINED;
HLR_INDEX=1;
PS_MSISDNLESS_SUPPORTED=FALSE;
CS_MSISDNLESS_SUPPORTED=FALSE;
CSRATTYPE=NO-NO-NO-NO-NO;
PSRATTYPE=NO-NO-NO-NO-NO;
ICI=NO;
STE=NO;
<SUBEND
<SUBBEGIN
IMSI=xxxxxxxxxxxx;
MSISDN=xxxxxxxxx;
DEFCALL=TS11;
CURRENTNAM=BOTH;
VOLTE_TAG=NOT_DEFINED;
HLR_INDEX=1;
PS_MSISDNLESS_SUPPORTED=FALSE;
CS_MSISDNLESS_SUPPORTED=FALSE;
CSRATTYPE=NO-NO-NO-NO-NO;
<SUBEND
This is essentially one record and this is followed by multiple rows in the same format. I want the output to be in the format as:
IMSI|MSISDN|DEFCALL|CURRENTNAM|CAT...
xxxx|xxxx|TS11|BOTH|COMMON|COMMON
Any help is much appreciated.
$ cat tst.awk
BEGIN {FS="[=;]"; OFS="|" }
/^<SUB/ {
if (/END/) {
print (hdrPrinted++ ? "" : hdr ORS ) rec
hdr = rec = ""
}
next
}
{
sub(/^[[:space:]]+/,"")
hdr = (hdr=="" ? "" : hdr OFS) $1
rec = (rec=="" ? "" : rec OFS) $2
}
$ awk -f tst.awk file
IMSI|MSISDN|DEFCALL|CURRENTNAM|CAT|VOLTE_TAG|HLR_INDEX|PS_MSISDNLESS_SUPPORTED|CS_MSISDNLESS_SUPPORTED|CSRATTYPE|PSRATTYPE|ICI|STE
xxxxxxxxxxxx|xxxxxxxxx|TS11|BOTH|COMMON|NOT_DEFINED|1|FALSE|FALSE|NO-NO-NO-NO-NO|NO-NO-NO-NO-NO|NO|NO
$ cat test.txt
/<SUBBEGIN/ {f=1; next} # at start flag up
/<SUBEND/ { # at end
print b ORS c # print
f=0; b=c="" # flag up and reset variables
}
f { # between markers
split($1,a,"[=;]") # gather to 2 variables
b=b a[1] "|"
c=c a[2] "|"
}
Test it:
$ awk -f test.awk test.txt
IMSI|MSISDN|DEFCALL|CURRENTNAM|CAT|VOLTE_TAG|HLR_INDEX|PS_MSISDNLESS_SUPPORTED|CS_MSISDNLESS_SUPPORTED|CSRATTYPE|PSRATTYPE|ICI|STE|
xxxxxxxxxxxx|xxxxxxxxx|TS11|BOTH|COMMON|NOT_DEFINED|1|FALSE|FALSE|NO-NO-NO-NO-NO|NO-NO-NO-NO-NO|NO|NO|
Workaround using tr
tr -s '\n' ',' < file > tmpfile;
This gives me the output in form of
<SUBBEGIN IMSI=xxxxxxxxxxxx; MSISDN=xxxxxxxxx; DEFCALL=TS11; CURRENTNAM=BOTH; CAT=COMMON; VOLTE_TAG=NOT_DEFINED; HLR_INDEX=1; PS_MSISDNLESS_SUPPORTED=FALSE; CS_MSISDNLESS_SUPPORTED=FALSE; CSRATTYPE=NO-NO-NO-NO-NO; PSRATTYPE=NO-NO-NO-NO-NO; ICI=NO; STE=NO; <SUBEND
Replace the string "<SUBBEGIN" with \n
Start with this one:
sed '/<SUBBEGIN/{:a;N;/\<SUBEND/!ba;s/\n[^=]*=/ /g;s/.*SUBBEGIN//;s/;/|/g}' input
Here is an another solution:
awk script
#!/bin/awk
function print_record( hdr )
{
str = ""
for( i = 1; i <= 13; i++ )
{
if( hdr )
{
value = substr( $i, 1, index( $i, "=" ) - 1 )
}
else
{
value = substr( $i, index( $i, "=" ) + 1 )
}
gsub( /^[ \t]+/, "", value )
if( length(str) > 0 )
str = str OFS
str = str value
}
print str
}
BEGIN {
RS="<SUBBEGIN\n"
FS=";\n"
hdr=1
OFS="|"
}
{
if( index( $0, "=" ) && index( $0, ";" ) )
{
if( hdr )
{
print_record( 1 )
hdr = 0;
}
print_record( 0 )
}
}
# eof #
Input file
<SUBBEGIN
IMSI=xxxxxxxxxxxx;
MSISDN=xxxxxxxxx;
DEFCALL=TS11;
CURRENTNAM=BOTH;
CAT=COMMON;
VOLTE_TAG=NOT_DEFINED;
HLR_INDEX=1;
PS_MSISDNLESS_SUPPORTED=FALSE;
CS_MSISDNLESS_SUPPORTED=FALSE;
CSRATTYPE=NO-NO-NO-NO-NO;
PSRATTYPE=NO-NO-NO-NO-NO;
ICI=NO;
STE=NO;
<SUBEND
<SUBBEGIN
IMSI=yyyyyyyyyy;
MSISDN=yyyyyyyyy;
DEFCALL=TS11;
CURRENTNAM=BOTH;
CAT=COMMON;
VOLTE_TAG=NOT_DEFINED;
HLR_INDEX=2;
PS_MSISDNLESS_SUPPORTED=TRUE;
CS_MSISDNLESS_SUPPORTED=FALSE;
CSRATTYPE=NO-YES-NO-NO-NO;
PSRATTYPE=NO-NO-NO-YES-NO;
ICI=NO;
STE=NO;
<SUBEND
<SUBBEGIN
IMSI=zzzzzzzzzz;
MSISDN=zzzzzzzzzzzzzzz;
DEFCALL=TS11;
CURRENTNAM=BOTH;
CAT=COMMON;
VOLTE_TAG=NOT_DEFINED;
HLR_INDEX=3;
PS_MSISDNLESS_SUPPORTED=FALSE;
CS_MSISDNLESS_SUPPORTED=TRUE;
CSRATTYPE=NO-YES-YES-NO-NO;
PSRATTYPE=NO-NO-YES-YES-NO;
ICI=YES;
STE=YES;
<SUBEND
Output
$ awk -f script.awk -- input.txt
IMSI|MSISDN|DEFCALL|CURRENTNAM|CAT|VOLTE_TAG|HLR_INDEX|PS_MSISDNLESS_SUPPORTED|CS_MSISDNLESS_SUPPORTED|CSRATTYPE|PSRATTYPE|ICI|STE
xxxxxxxxxxxx|xxxxxxxxx|TS11|BOTH|COMMON|NOT_DEFINED|1|FALSE|FALSE|NO-NO-NO-NO-NO|NO-NO-NO-NO-NO|NO|NO
yyyyyyyyyy|yyyyyyyyy|TS11|BOTH|COMMON|NOT_DEFINED|2|TRUE|FALSE|NO-YES-NO-NO-NO|NO-NO-NO-YES-NO|NO|NO
zzzzzzzzzz|zzzzzzzzzzzzzzz|TS11|BOTH|COMMON|NOT_DEFINED|3|FALSE|TRUE|NO-YES-YES-NO-NO|NO-NO-YES-YES-NO|YES|YES
Hope It Helps!
with unix toolchain, perhaps the shortest...
$ sed '/^</d' file | tr '=' '\n' | tr -d ' ;' | pr -13ts'|'
IMSI|MSISDN|DEFCALL|CURRENTNAM|CAT|VOLTE_TAG|HLR_INDEX|PS_MSISDNLESS_SUPPORTED|CS_MSISDNLESS_SUPPORTED|CSRATTYPE|PSRATTYPE|ICI|STE
xxxxxxxxxxxx|xxxxxxxxx|TS11|BOTH|COMMON|NOT_DEFINED|1|FALSE|FALSE|NO-NO-NO-NO-NO|NO-NO-NO-NO-NO|NO|NO
Using a different input file for simplicity
$ cat ip.txt
<SUBBEGIN
i1=abc;
i2=ijk;
i3=xyz;
k1=NO;
t1=YES;
<SUBEND
<SUBBEGIN
i1=foo;
i2=bar;
i3=test;
k1=YES;
t1=NO;
<SUBEND
$ perl -nle '
$s=/<SUBBEGIN/ if /<SUB/;
if($s && !/<SUB/)
{
($k,$v) = /\S+(?==)|=\K[^;]+/g;
push(#key, $k);
push(#val, $v);
}
elsif(#key)
{
print join "|", #key;
print join "|", #val;
#key = ();
#val = ();
}
' ip.txt
i1|i2|i3|k1|t1
abc|ijk|xyz|NO|YES
i1|i2|i3|k1|t1
foo|bar|test|YES|NO
$s set flag if input line contains <SUBBEGIN
If flag is set and input line doesn't contain <SUB
extract key, value pair
populate them in two different arrays
Once input line contains <SUB again
Check if one of the array (say #key) is not empty
print the key value arrays with | as separator
empty the arrays
This will work whether there are empty lines or not between data structures
Would like to extract the line items, if the date range between 25-mar-2015 to 05-may-2015 from second field ($2) .
Date column is not sorted and each files contain millions of records.
Inputs.gz
Des,DateInfo,Amt,Loc,Des2
abc,02-dec-2014,10,def,xyz
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,17-mar-2014,55,def,xyz
abc,24-nov-2011,70,def,xyz
abc,13-may-2015,85,def,xyz
abc,30-sep-2008,100,def,xyz
abc,20-jan-2014,115,def,xyz
abc,04-may-2015,130,def,xyz
abc,25-nov-2013,145,def,xyz
abc,29-mar-2015,55,def,xyz
I have tried like below command and in-complete :
function getDate(date) {
split(date, a, "-");
return mktime(a[3] " " sprintf("%02i",(index("janfebmaraprmayjunjulaugsepoctnovdec", a[2])+2)/3) " " a[1] " 00 00 00")
}
BEGIN {FS=","}
{ if ( getDate($2)>=getDate(25-mar-2015) && getDate($2)<=getDate(05-may-2015) ) print $0 }
Expected Output:
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,04-may-2015,130,def,xyz
abc,29-mar-2015,55,def,xyz
Please suggest ... I dont have perl & python access.
$ cat tst.awk
function getDate(date, a) {
split(date, a, /-/)
return mktime(a[3]" "(index("janfebmaraprmayjunjulaugsepoctnovdec",a[2])+2)/3" "a[1]" 0 0 0")
}
BEGIN {
FS=","
beg = getDate("25-mar-2015")
end = getDate("05-may-2015")
}
{ cur = getDate($2) }
NR>1 && cur>=beg && cur<=end
$ awk -f tst.awk file
abc,20-apr-2015,25,def,xyz
abc,14-apr-2015,40,def,xyz
abc,04-may-2015,130,def,xyz
abc,29-mar-2015,55,def,xyz