I want to replace all commas between the 13th comma starting from left and 12th comma starting from right in Unix - unix

Present
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer, Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
Requirement
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
Current
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner,your current Core balance is Rs.29.8,GSM balance is Rs.12892.14,MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
Requirement
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner your current Core balance is Rs.29.8 GSM balance is Rs.12892.14 MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
I need to replace all the commas between the 13th comma from the left to the 12th comma starting from the right with a space, on a Unix system.

Here's a moderately succinct but mostly inscrutable (if not incomprehensible) solution using Perl.
#!/usr/bin/perl -anlF,
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
$, = ","; # Perl is obscure on occasion!
my($nflds) = scalar(#F);
print #F[0 .. $lhs-1], "#F[$lhs .. $nflds-$rhs-1]", #F[$nflds-$rhs .. $nflds-1]
if ($nflds > $lhs + $rhs);
The shebang line uses -l to make Perl handle newlines automatically. See perldoc perlrun.
It also uses -F, which, in Perl 5.20.0 and later, is explicitly documented to automatically put Perl into -a (awk) mode and -n (read loop but don't print) mode. The input lines are automatically split into the array F using , as the delimiter. Earlier versions of Perl do not infer -a and -n from the presence of -F, so the code (now) uses -an as well as -F,. The updated code has been shown to work with Perl 5.10.0 and with each major release up to 5.18, as well as 5.20 and later.
The use strict; and use warnings; lines set Perl to fussy. You should always use them.
The two assignments set up the values you specified in the question, except that it seems to be the 13th field from the right, rather than the 12th, that you want combined. They're easily fungible if you need to.
The $, = ","; lines sets the output field separator (OFS in Awk, and $OFS in Perl under use English qw( -no_match_vars );). See perldoc English and perldoc perlvars.
The my($nflds) = scalar(#F); line determines the number of fields.
The print line is conditional on there being enough fields.
It uses Perl's array slices to:
print fields 0..$lhs-1 as separate comma-separated fields
combine fields $lhs..$nflds-$rhs-1 as a single space-separated field (by virtue of the string around the slice)
print fields $nflds-$rhs..$nflds-1 as separate comma-separated fields
The output from that, given your input data, is:
1856292496,-1863203096,302,918468087151,808648712,405670043170066,919015026101,M,6,T,0,15,2c,Dear Customer Your Request is under Process,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,03,,255,,333,ERecharge_RCOM,919015540301
1856292499,-1863203087,301,918081224379,808648711,405540046666191,919026240102,M,6,T,0,15,8d,Dear Business Partner your current Core balance is Rs.29.8 GSM balance is Rs.12892.14 MRCOM balance is Rs.1 and MRTL balance is Rs.1.Reliance,03,11/05/2017 10:00:00,11/05/2017 10:00:00,11/,11/05/2017 10:00:00,0,01,,255,,333,BalQuery_RCOM,919835853611
Note that the leading space on one of the fields in the first line is preserved.
I didn't come up with that immediately. I generated a more verbose solution like this, first:
#!/usr/bin/env perl -l
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
while (<>)
{
chomp;
my(#fields) = split /,/;
my($nflds) = scalar(#fields);
my(#output) = #fields;
if ($nflds > $lhs + $rhs)
{
my(#combine) = #fields[$lhs .. $nflds-$rhs-1];
my $composite = "#combine";
#output = (#fields[0 .. $lhs-1], $composite, #fields[$nflds-$rhs .. $nflds-1]);
}
local $, = ",";
print #output;
}
This produces the same output as the other script. I had the scripts called rs13.pl (verbose) and rs17.pl (compact) and checked them like this (data contains your two lines of input data):
diff <(perl rs13.pl data) <(perl rs17.pl data)
There was no difference.
There are ways to make the compact solution more compact, but I'm not sure they help much.
Here is another version that uses the splice and join functions instead of array slices. In some ways, it is tidier than the other two, but it doesn't have the same protection against lines with too few fields in them.
#!/usr/bin/perl -anlF,
use strict;
use warnings;
my $lhs = 13;
my $rhs = 13;
$, = ","; # Perl is obscure on occasion!
my($nflds) = scalar(#F);
splice(#F, $lhs, $nflds - $lhs - $rhs, join(' ', #F[$lhs .. $nflds-$rhs-1]));
print #F;
It produces the same result as the other two scripts.
Yes, you could write the code in Awk; it wouldn't be as compact as this.

Related

Unix: Using filename from another file

A basic Unix question.
I have a script which counts the number of records in a delta file.
awk '{
n++
} END {
if(n >= 1000) print "${completeFile}"; else print "${deltaFile}";
}' <${deltaFile} >${fileToUse}
Then, depending on the IF condition, I want to process the appropriate file:
cut -c2-11 < ${fileToUse}
But how do I use the contents of the file as the filename itself?
And if there are any tweaks to be made, feel free.
Thanks in advance
Cheers
Simon
To use as a filename the contents of a file which is itself identified by a variable (as asked)
cut -c2-11 <"$( cat $filetouse )"
// or in zsh just
cut -c2-11 <"$( < $filetouse )"
unless the filename in the file ends with one or more newline character(s), which people rarely do because it's quite awkward and inconvenient, then something like:
read -rdX var <$filetouse; cut -c2-11 < "${var%?}"
// where X is a character that doesn't occur in the filename
// maybe something like $'\x1f'
Tweaks: your awk prints the variable reference ${completeFile} or ${deltaFile} (because they're within the single-quoted awk script) not the value of either variable. If you actually want the value, as I'd expect from your description, you should pass the shell vars to awk vars like this
awk -vf="$completeFile" -vd="$deltaFile" '{n++} END{if(n>=1000)print f; else print d}' <"$deltaFile"`
# the " around $var can be omitted if the value contains no whitespace and no glob chars
# people _often_ but not always choose filenames that satisfy this
# and they must not contain backslash in any case
or export the shell vars as env vars (if they aren't already) and access them like
awk '{n++} END{if(n>=1000) print ENVIRON["completeFile"]; else print ENVIRON["deltaFile"]}' <"$deltaFile"
Also you don't need your own counter, awk already counts input records
awk -vf=... -vd=... 'END{if(NR>=1000)print f;else print d}' <...
or more briefly
awk -vf=... -vd=... 'END{print (NR>=1000?f:d)}' <...
or using a file argument instead of redirection so the name is available to the script
awk -vf="$completeFile" 'END{print (NR>=1000?f:FILENAME)}' "$deltaFile" # no <
and barring trailing newlines as above you don't need an intermediate file at all, just
cut -c2-11 <"$( awk -vf="$completeFile" -'END{print (NR>=1000?f:FILENAME)}' "$deltaFile")"
Or you don't really need awk, wc can do the counting and any POSIX or classic shell can do the comparison
if [ $(wc -l <"$deltaFile") -ge 1000 ]; then c="$completeFile"; else c="$deltaFile"; fi
cut -c2-11 <"$c"

grep string from a TCL variable

I want to grep a certain amount of string from a TCL variable and use that in my tool command. Example:
${tcl_Var} - this contains string like VEG_0_1/ABC
I want to grep from above string until it the point it hits first forward slash, so in the above case it would be VEG_0_1. And then replace it in my command. Example:
VEG_0_1/REST_OF_THE_COMMAND.
Don't think in terms of grep, think about "string manipulation" instead.
Use regsub for "search and replace:
% set tcl_Var VEG_0_1/ABC
VEG_0_1/ABC
% set newvar [regsub {/.+} $tcl_Var {/REST_OF_THE_COMMAND}]
VEG_0_1/REST_OF_THE_COMMAND
Alternately, your problem can be solved by splitting the string on /, taking the first component, then appending the "rest of the command":
% set newvar "[lindex [split $tcl_Var /] 0]/REST_OF_THE_COMMAND"
VEG_0_1/REST_OF_THE_COMMAND
Or using string indices:
% set newvar "[string range $tcl_Var 0 [string first / $tcl_Var]]REST_OF_THE_COMMAND"
VEG_0_1/REST_OF_THE_COMMAND
You can do this with regular expressions using the regsub TCL command. There is no need to run the external program grep. See more info here: http://www.tcl.tk/man/tcl8.4/TclCmd/regsub.htm
If you are new to regular expressions, read TCL-specific tutorial about them.
set tcl_Var VEG_0_1/ABC
set varlist [split $tcl_Var "/"]
set newvar [lindex $varlist 0]/REST_OF_THE_COMMAND

read.csv.sql filter for fields with columns

I haven't been able to solve this one using the answers in this question or from the sqldf FAQ's.
LOC_NAME,BIRTH_DTTM,MOM_PAT_MRN_ID,EMPI,MOM_PAT_NAME,MOM_HOSP_ADMSN_TIME,MOM_HOSP_DISCH_TIME,DEL_PROV_NAME,ATTND_PROV_NAME,DELIVERY_TYPE,PRIM.REPT,COUNT_OF_BABIES,CHILD_PED_GEST_AGE_NUM,REASON_FOR_DEL,REASON_DEL_COM,INDUCT_METHOD,INDUCT_COM,AUGMENTATION
HOSPITAL,1/1/2000 10:00,abc,Eabc,"Surname1, Given1",1/1/2000 10:00,1/3/2000 10:00,"Doctor, First","Doctor, First","C-Section, Low Transverse",Repeat,1,38,,,1) None,,1) None
HOSPITAL,1/2/2000 11:00,def,Edef,"Surname2, Given2",1/2/2000 11:00,1/5/2000 11:00,"Doctor2, First2","Doctor2, First2","C-Section, Low Transverse",Primary,1,36,Ruptured Membranes;Labor;Other (see comment),"PPROM, Preterm labor",1) None,,1) None
HOSPITAL,1/3/2000 12:00,ghi,Eghi,"Surname3, Given3",1/3/2000 12:00,1/6/2000 12:00,"Doctor3, First3","Doctor3, First3","C-Section, Low Transverse",Repeat,1,31,Other (see comment),,1) None,,1) None
HOSPITAL,1/4/2000 13:00,jkl,Ejkl,"Surname4, Given4",1/4/2000 13:00,1/7/2000 13:00,,"Doctor4, First4","Vaginal, Spontaneous Delivery",,1,28,Other (see comment),Fetal anomaly,1) oxytocin (Pitocin),,
To read in the data, I have tried:
read.csv.sql(file)
read.csv.sql(file, filter = 'tr.exe -d ^" ')
read.csv.sql(file, filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
read.csv.sql(file,
filter = "perl -e 's{(\"[^\",]+),([^\"]+\")}{$_= $&, s/,/_/g, $_}eg'")
I'm working in R 3.0.0 with R Studio Server on a Ubuntu OS.
Unfortunately, changing the delimiter isn't an option (nor would it be very effective for some of the files I need to query. Some of my files are pathology reports, so no matter what delimiter I use, I'm going to run into this problem.
Any hints on what I'm missing to get this to read in?
Try csvfix as in sqldf FAQ #13 but use the write_dsv's default | symbol rather than ; since there are semicolons in your file:
read.csv.sql("myfile.csv", sep = "|", filter = "csvfix write_dsv")

pyparsing multiple lines optional missing data in result set

I am quite new pyparsing user and have missing match i don't understand:
Here is the text i would like to parse:
polraw="""
set policy id 800 from "Untrust" to "Trust" "IP_10.124.10.6" "MIP(10.0.2.175)" "TCP_1002" permit
set policy id 800
set dst-address "MIP(10.0.2.188)"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 724 from "Trust" to "Untrust" "IP_10.16.14.28" "IP_10.24.10.6" "TCP_1002" permit
set policy id 724
set src-address "IP_10.162.14.38"
set dst-address "IP_10.3.28.38"
set service "TCP_1002-1005"
set log session-init
exit
set policy id 233 name "THE NAME is 527 ;" from "Untrust" to "Trust" "IP_10.24.108.6" "MIP(10.0.2.149)" "TCP_1002" permit
set policy id 233
set service "TCP_1002-1005"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
"""
I setup grammar this way:
KPOL = Suppress(Keyword('set policy id'))
NUM = Regex(r'\d+')
KSVC = Suppress(Keyword('set service'))
KSRC = Suppress(Keyword('set src-address'))
KDST = Suppress(Keyword('set dst-address'))
SVC = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
ADDR = dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
EXIT = Suppress(Keyword('exit'))
EOL = LineEnd().suppress()
P_SVC = KSVC + SVC + EOL
P_SRC = KSRC + ADDR + EOL
P_DST = KDST + ADDR + EOL
x = KPOL + NUM('PId') + EOL + Optional(ZeroOrMore(P_SVC)) + Optional(ZeroOrMore(P_SRC)) + Optional(ZeroOrMore(P_DST))
for z in x.searchString(polraw):
print z
Result set is such as
['800', 'MIP(10.0.2.188)']
['724', 'IP_10.162.14.38', 'IP_10.3.28.38']
['233', 'TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
The 800 is missing service tag ???
What's wrong here.
Thanks by advance
Laurent
The problem you are seeing is that in your expression, DST's are only looked for after having skipped over optional SVC's and SRC's. You have a couple of options, I'll go through each so you can get a sense of what all is going on here.
(But first, there is no point in writing "Optional(ZeroOrMore(anything))" - ZeroOrMore already implies Optional, so I'm going to drop the Optional part in any of these choices.)
If you are going to get SVC's, SRC's, and DST's in any order, you could refactor your ZeroOrMore to accept any of the three data types, like this:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC|P_SRC|P_DST)
This will allow you to intermix different types of statements, and they will all get collected as part of the ZeroOrMore repetition.
If you want to keep these different types of statements in groups, then you can add a results name to each:
x = KPOL + NUM('PId') + EOL + ZeroOrMore(P_SVC("svc*")|
P_SRC("src*")|
P_DST("dst*"))
Note the trailing '*' on each name - this is equivalent to calling setResultsName with the listAllMatches argument equal to True. As each different expression is matched, the results for the different types will get collected into the "svc", "src", or "dst" results name. Calling z.dump() will list the tokens and the results names and their values, so you can see how this works.
set policy id 233
set service "TCP_1002-1005"
set dst-address "IP_10.3.28.38"
set service "TCP_1006-1008"
set service "TCP_1786"
set log session-init
exit
shows this for z.dump():
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: [['IP_10.3.28.38']]
- svc: [['TCP_1002-1005'], ['TCP_1006-1008'], ['TCP_1786']]
If you wrap ungroup on the P_xxx expressions, maybe like this:
P_SVC,P_SRC,P_DST = (ungroup(expr) for expr in (P_SVC,P_SRC,P_DST))
then the output is even cleaner-looking:
['233', 'TCP_1002-1005', 'IP_10.3.28.38', 'TCP_1006-1008', 'TCP_1786']
- PId: 233
- dst: ['IP_10.3.28.38']
- svc: ['TCP_1002-1005', 'TCP_1006-1008', 'TCP_1786']
This is actually looking pretty good, but let me pass on one other option. There are a number of cases where parsers have to look for several sub-expressions in any order. Let's say they are A,B,C, and D. To accept these in any order, you could write something like OneOrMore(A|B|C|D), but this would accept multiple A's, or A, B, and C, but not D. The exhaustive/exhausting combinatorial explosion of (A+B+C+D) | (A+B+D+C) | etc. could be written, or you could maybe automate it with something like
from itertools import permutations
mixNmatch = MatchFirst(And(p) for p in permutations((A,B,C,D),4))
But there is a class in pyparsing called Each that allows to write the same kind of thing:
Each([A,B,C,D])
meaning "must have one each of A, B, C, and D, in any order". And like And, Or, NotAny, etc., there is an operator shortcut too:
A & B & C & D
which means the same thing.
If you want "must have A, B, and C, and optionally D", then write:
A & B & C & Optional(D)
and this will parse with the same kind of behavior, looking for A, B, C, and D, regardless of the incoming order, and whether D is last or mixed in with A, B, and C. You can also use OneOrMore and ZeroOrMore to indicate optional repetition of any of the expressions.
So you could write your expression as:
x = KPOL + NUM('PId') + EOL + (ZeroOrMore(P_SVC) &
ZeroOrMore(P_SRC) &
ZeroOrMore(P_DST))
I looked at using results names with this expression, and the ZeroOrMore's seem to be confusing things, maybe still a bug in how this is done. So you may have to reserve using Each for more basic cases like my A,B,C,D example. But I wanted to make you aware of it.
Some other notes on your parser:
dblQuotedString.setParseAction(lambda t: t[0].replace('"','')) is probably better written
dblQuotedString.setParseAction(removeQuotes). You don't have any embedded quotes in your examples, but it's good to be aware of where your assumptions might not translate to a future application. Here are a couple of ways of removing the defining quotes:
dblQuotedString.setParseAction(lambda t: t[0].replace('"',''))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \ and an ending quote \'
# removed leading and trailing "s, but also internal ones too, which are
# really part of the quoted string
dblQuotedString.setParseAction(lambda t: t[0].strip('"'))
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \'
# removed leading and trailing "s, and leaves the one internal ones but strips off
# the escaped ending quote
dblQuotedString.setParseAction(removeQuotes)
print dblQuotedString.parseString(r'"This is an embedded quote \" and an ending quote \""')[0]
# prints 'This is an embedded quote \" and an ending quote \"'
# just removes leading and trailing " characters, leaves escaped "s in place
KPOL = Suppress(Keyword('set policy id')) is a bit fragile, as it will break if there are any extra spaces between 'set' and 'policy', or between 'policy' and 'id'. I usually define these kind of expressions by first defining all the keywords individually:
SET,POLICY,ID,SERVICE,SRC_ADDRESS,DST_ADDRESS,EXIT = map(Keyword,
"set policy id service src-address dst-address exit".split())
and then define the separate expressions using:
KSVC = Suppress(SET + SERVICE)
KSRC = Suppress(SET + SRC_ADDRESS)
KDST = Suppress(SET + DST_ADDRESS)
Now your parser will cleanly handle extra whitespace (or even comments!) between individual keywords in your expressions.

Delete a line with a pattern

Hi I want to delete a line from a file which matches particular pattern
the code I am using is
BEGIN {
FS = "!";
stopDate = "date +%Y%m%d%H%M%S";
deletedLineCtr = 0; #diagnostics counter, unused at this time
}
{
if( $7 < stopDate )
{
deletedLineCtr++;
}
else
print $0
}
The code says that the file has lines "!" separated and 7th field is a date yyyymmddhhmmss format. The script deletes a line whose date is less than the system date. But this doesn't work. Can any one tell me the reason?
Is the awk(1) assignment due Tuesday? Really, awk?? :-)
Ok, I wasn't sure exactly what you were after so I made some guesses. This awk program gets the current time of day and then removes every line in the file less than that. I left one debug print in.
BEGIN {
FS = "!"
stopDate = strftime("%Y%m%d%H%M%S")
print "now: ", stopDate
}
{ if ($7 >= stopDate) print $0 }
$ cat t2.data
!!!!!!20080914233848
!!!!!!20090914233848
!!!!!!20100914233848
$ awk -f t2.awk < t2.data
now: 20090914234342
!!!!!!20100914233848
$
call date first to pass the formatted date as a parameter:
awk -F'!' -v stopdate=$( date +%Y%m%d%H%M%S ) '
$7 < stopdate { deletedLineCtr++; next }
{print}
END {do something with deletedLineCrt...}
'
You would probably need to run the date command - maybe with backticks - to get the date into stopDate. If you printed stopDate with the code as written, it would contain "date +...", not a string of digits. That is the root cause of your problem.
Unfortunately...
I cannot find any evidence that backticks work in any version of awk (old awk, new awk, GNU awk). So, you either need to migrate the code to Perl (Perl was originally designed as an 'awk-killer' - and still includes a2p to convert awk scripts to Perl), or you need to reconsider how the date is set.
Seeing #DigitalRoss's answer, the strftime() function in gawk provides you with the formatting you want (check 'info gawk' as I did).
With that fixed, you should be getting the right lines deleted.

Resources