gawk - Trouble with arrays of arrays to catch duplicate strings

gawk - Trouble with arrays of arrays to catch duplicate strings - multidimensional-array

I am using gawk 4.1.1 to parse Cisco IOS configuration files, comparing switch configs to the routers above them, to eventually pull out IP addresses. My problem is I need to skip a record (interface) if I have already processed the vlan before. The catch is I am processing multiple files from different sites, and each site has it's own vlans.
So I thought an array of arrays would work best, where array[pop] could be array[den01], array[nyc01], etc. Dynamically creating those index-arrays from the variable pop, pulled from a text file.
The data processing works fine, except that it produces duplicate vlan numbers at a given site. The "dbfile" contains every vlan seen on every switch, from every site, and some people have redundant links so their vlan shows up twice.
Below is a snippet of my code:
#!/bin/bash
for i in `cat dbfile` ]; do
# set lots of variables to pull into awk
awk -v i=$i -v name=$name -v intname=$intname -v cvupfile=$cvupfile -v pop=$pop -v cid=$cid -v custvlan=$custvlan 'BEGIN { RS="!"; FS=" "; array[pop][0] = "" }
{
if ( $1 ~ "interface" && $2 ~ "Vlan" )
{
# trim Vlan ID so only the actual vlan number remains
seenvlan=gensub(/^Vlan/, "", "g", $2)
if ( seenvlan == custvlan )
{
# this is where trouble starts. I never get this IF to be true
if ( custvlan in array[pop] )
{
print "Duplicate VLAN found!"
}
else
{
# vlan is unique, make index for it
array[pop][custvlan]=custvlan
# debug output
for ( i in array )
print "This is first element: " i
for ( j in array[i] )
print "This is second element: " array[i][j]
print "Here is value from array: " array[pop][custvlan]
exit
}
}
}
}
END {
}' $crt
done
Here is some sample output. It shows 3 records processed. The fact that "This is the second element" is printed twice seems like a clue, but I'm not sure what it means. I am making a bunch of null indexes under array[pop]?
This is first element: den01
This is second element:
This is second element: 235
Here is value from array: 235
This is first element: den01
This is second element:
This is second element: 279
Here is value from array: 279
This is first element: den01
This is second element:
This is second element: 131
Here is value from array: 131

Related

Need of awk command explaination

I want to know how the below command is working.
awk '/Conditional jump or move depends on uninitialised value/ {block=1} block {str=str sep $0; sep=RS} /^==.*== $/ {block=0; if (str!~/oracle/ && str!~/OCI/ && str!~/tuxedo1222/ && str!~/vprintf/ && str!~/vfprintf/ && str!~/vtrace/) { if (str!~/^$/){print str}} str=sep=""}' file_name.txt >> CondJump_val.txt
I'd also like to know how to check the texts Oracle, OCI, and so on from the second line only.

The first step is to write it so it's easier to read
awk '
/Conditional jump or move depends on uninitialised value/ {block=1}
block {
str=str sep $0
sep=RS
}
/^==.*== $/ {
block=0
if (str!~/oracle/ && str!~/OCI/ && str!~/tuxedo1222/ && str!~/vprintf/ && str!~/vfprintf/ && str!~/vtrace/) {
if (str!~/^$/) {
print str
}
}
str=sep=""
}
' file_name.txt >> CondJump_val.txt
It accumulates the lines starting with "Conditional jump ..." ending with "==...== " into a variable str.
If the accumulated string does not match several patterns, the string is printed.
I'd also like to know how to check the texts Oracle, OCI, and so on from the second line only.
What does that mean? I assume you don't want to see the "Conditional jump..." line in the output. If that's the case then use the next command to jump to the next line of input.
/Conditional jump or move depends on uninitialised value/ {
block=1
next
}

perhaps consolidate those regex into a single chain ?
if (str !~ "oracle|OCI|tuxedo1222|v[f]?printf|vtrace") {
print str
}

There are two idiomatic awkisms to understand.
The first can be simplified to this:
$ seq 100 | awk '/^22$/{flag=1}
/^31$/{flag=0}
flag'
22
23
...
30
Why does this work? In awk, flag can be tested even if not yet defined which is what the stand alone flag is doing - the input is only printed if flag is true and flag=1 is only executed when after the regex /^22$/. The condition of flag being true ends with the regex /^31$/ in this simple example.
This is an idiom in awk to executed code between two regex matches on different lines.
In your case, the two regex's are:
/Conditional jump or move depends on uninitialised value/ # start
# in-between, block is true and collect the input into str separated by RS
/^==.*== $/ # end
The other 'awkism' is this:
block {str=str sep $0; sep=RS}
When block is true, collect $0 into str and first time though, RS should not be added in-between the last time. The result is:
str="first lineRSsecond lineRSthird lineRS..."
both depend on awk being able to use a undefined variable without error

Implement tr and sed functions in awk

I need to process a text file - a big CSV - to correct format in it. This CSV has a field which contains XML data, formatted to be human readable: break up into multiple lines and indentation with spaces. I need to have every record in one line, so I am using awk to join lines, and after that I am using sed, to get rid of extra spaces between XML tags, and after that tr to eliminate unwanted "\r" characters.
(the first record is always 8 numbers and the fiels separator is the pipe character: "|"
The awk scrips is (join4.awk)
BEGIN {
# initialise "line" variable. Maybe unnecessary
line=""
}
{
# check if this line is a beginning of a new record
if ( $0 ~ "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]|" ) {
# if it is a new record, then print stuff already collected
# then update line variable with $0
print line
line = $0
} else {
# if it is not, then just attach $0 to the line
line = line $0
}
}
END {
# print out the last record kept in line variable
if (line) print line
}
and the commandline is
cat inputdata.csv | awk -f join4.awk | tr -d "\r" | sed 's/> *</></g' > corrected_data.csv
My question is if there is an efficient way to implement tr and sed functionality inside the awk script? - this is not Linux, so I gave no gawk, just simple old awk and nawk.
thanks,
--Trifo

tr -d "\r"
Is just gsub(/\r/, "").
sed 's/> *</></g'
That's just gsub(/> *</, "><")

mawk NF=NF RS='\r?\n' FS='> *<' OFS='><'

Thank you all folks!
You gave me the inspiration to get to a solution. It is like this:
BEGIN {
# initialize "line" variable. Maybe unnecessary.
line=""
}
{
# if the line begins with 8 numbers and a pipe char (the format of the first record)...
if ( $0 ~ "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\|" ) {
# ... then the previous record is ready. We can post process it, the print out
# workarounds for the missing gsub function
# removing extra spaces between xml tags
# removing extra \r characters the same way
while ( line ~ "\r") { sub( /\r/,"",line) }
# "<text text> <tag tag>" should look like "<text text><tag tag>"
while ( line ~ "> *<") { sub( /> *</,"><",line) }
# then print the record and update line var with the beginning of the new record
print line
line = $0
} else {
# just keep extending the record with the actual line
line = line $0
}
}
END {
# print the last record kept in line var
if (line) {
while ( line ~ "\r") { sub( /\r/,"",line) }
while ( line ~ "> *<") { sub( /> *</,"><",line) }
print line
}
}
And yes, it is efficient: the embedded version runs abou 33% faster.
And yes, it would be nicer to create a function for the postprocessing of the records in "line" variable. Now I have to write the same code twice to process the last recond in the END section. But it works, it creates the same output as the chained commands and it is way faster.
So, thanks for the inspiration again!
--Trifo

How to get the variable's name from a file using source command in UNIX?

I have a file named param1.txt which contains certain variables. I have another file as source1.txt which contains place holders. I want to replace the place holders with the values of the variables that I get from the parameter file.
I have basically hard coded the script where the variable names in the parameter.txt file is known before hand. I want to know a dynamic solution to the problem where the variable names will not be known beforehand. In other words, is there any way to find out the variable names in a file using the source command in UNIX?
Here is my script and the files.
Script:
#!/bin/bash
source /root/parameters/param1.txt
sed "s/{DB_NAME}/$DB_NAME/gI;
s/{PLANT_NAME}/$PLANT_NAME/gI" \
/root/sources/source1.txt >
/root/parameters/Output.txt`
param1.txt:
PLANT_NAME=abc
DB_NAME=gef
source1.txt:
kdashkdhkasdkj {PLANT_NAME}
jhdbjhasdjdhas kashdkahdk asdkhakdshk
hfkahfkajdfk ljsadjalsdj {PLANT_NAME}
{DB_NAME}

I cannot comment since I don't have enough points.
But is it correct that this is what you're looking for:
How to reference a file for variables using Bash?
Your problem statement isn't very clear to me. Perhaps you can simplify your problem and desired state.

Don't understand why you try to source param1.txt.
You can try with this awk :
awk '
NR == FNR {
a[$1] = $2
next
}
{
for ( i = 1 ; i <= NF ; i++ ) {
b = $i
gsub ( "^{|}$" , "" , b )
if ( b in a )
sub ( "{" b "}" , a[b] , $i )
}
} 1' FS='=' param1.txt FS=" " source1.txt

Split line into multiple lines of 42 Unix after last given char

I have a text file in unix formed from multiple long lines
ALTER Tit como(titel('42423432;434235111;757567562;2354679;5543534;6547673;32322332;54545453'))
ALTER Mit como(Alt('432322;434434211;754324237562;2354679;5543534;6547673;32322332;54545453'))
I need to split each line in multiple lines of no longer than 42 characters.
The split should be done at the end of last ";", and
so my ideal output file will be :
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
I used fold -w 42 givenfile.txt | sed 's/ $/ -/g'
it splits the line but doesnt add the "-" at the end of the line and doesnt split after the ";".
any help is much appreciated.
Thanks !

awk -F';' '
w{
print""
}
{
w=length($1)
printf "%s",$1
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
}
END{
print""
}
' file
This produces the output:
ALTER Tit como(titel('42423432;434235111; -
757567562;2354679;5543534;6547673; -
32322332;54545453'))
ALTER Mit como(Alt('432322;434434211; -
754324237562;2354679;5543534;6547673; -
32322332;54545453'))
How it works
Awk implicitly loops through each line of its input and each line is divided into fields. This code uses a single variable w to keep track of the current width of the output line.
-F';'
Tell awk to break fields on semicolons.
`w{print""}
If the last line was not completed, w>0, then print a newline to terminate it before we start with a new line.
w=length($1); printf "%s",$1
Print the first field of the new line and set w according to its length.
Loop over the remaining fields:
for (i=2;i<=NF;i++){
if ((w+length($i)+1)<42){
w+=length($i)+1
printf";%s",$i
} else {
w=length($i)
printf"; -\n%s",$i
}
}
This loops over the second to final fields of this line. Whenever we reach the point where we can't print another field without exceeding the 42 character limit, we print ; -\n.
END{print""}
Print a newline at the end of the file.

This might work for you (GNU sed):
sed -r 's/.{1,42}$|.{1,41};/& -\n/g;s/...$//' file
This globally replaces 1 to 41 characters followed by a ; or 1 to 42 characters followed by end of line with -\n. The last string will have three characters too many and so they are deleted.

Function to create the array by reading the file

I am creating scripts which will store the contents of pipe delimited file. Each column is stored in a separate array. I then read the information from the arrays and process it. There are 20 pipe delimited files and I need to write 20 scripts. The processing that will happen in each script after the information is stored in the array is different. The number of columns in each pipe delimited file is different (but in no case it would be more than 9 columns). I need to do this activity of storing the information in the array in the beginning of each script. The way I am doing it at present is given below. I want help from you to understand how can I write a function to do this activity.
cat > example_file.txt <<End-of-message
some text first row|other text first row|some other text first row
some text nth row|other text nth row|some other text nth row
End-of-message
# Note that example_file.txt will available. I have created it inside the script just to let you know the format of the file
OIFS=$IFS
IFS='|'
i=0
while read -r first second third ignore
do
first_arr[$i]=$first
second_arr[$i]=$second
third_arr[$i]=$third
(( i=i+1 ))
done < example_file.txt
IFS=$OIFS

Here is a sort-of minimal change to your script that should get you further...
...
...
while read -r first second third ignore
do
arr0[$i]=$first
arr1[$i]=$second
arr2[$i]=$third
(( i=i+1 ))
done < example_file.txt
IFS=$OIFS
proc0 () {
for j in "$#"; do
echo proc0 : "$j"
done
}
proc1 () {
echo proc1
}
proc2 () {
echo proc2
}
for i in 0 1 2; do
t=arr$i'[#]'
proc$i "${!t}"
done

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

gawk - Trouble with arrays of arrays to catch duplicate strings - multidimensional-array

Related

Need of awk command explaination

Implement tr and sed functions in awk

How to get the variable's name from a file using source command in UNIX?

Split line into multiple lines of 42 Unix after last given char

Function to create the array by reading the file

Categories

Resources