not able to understand NAWK use - unix

I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......

nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg

Related

using grep to print specific lines in another file

I want to use grep command to print only those lines that contains a specific string in another file. My file is in xsls format and looks like:
I only want to extract those lines that contain 5'UTR in Annotation column.
The command I use is:
grep 'UTR$' Lee2012.xslx > utr.txt
However, I end up getting an empty file. I'll appreciate if someone can explain what I am doing wrong. Insights will be appreciated. Thank you!
Your sixth column ends with UTR, not the whole line, that is why you get no results.
To get all lines where the sixth column ends with UTR, you can use
awk '$6 ~ /UTR$/' Lee2012.xslx > utr.txt
This assumes your column (field) separators are whitespace.

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

Find lines matching a pattern, provided their value in a specified column occurs exactly twice in the input file

Say the input is (.csv file):
a,b_b,3,c
d,k_k,3,f
g,h_h,3,i
j,k_k,4,l
m,n_n,4,o
p,k_k,5,q
r,s_s,5,t
I want this output:
All lines containing the pattern "k_k" whose number in the third column is found in exactly two lines (ex.: numbers 4 and 5):
j,k_k,4,l
p,k_k,5,q
It might be a simple one but I can't find I way to achieve this. Could anyone help me using Unix command lines (awk)?
awk '/k_k/' && ?? file.csv
I think you want something like this:
awk -F, 'FNR==NR{a[$3]++;next} /k_k/ {if(a[$3]==2)print $0}' file file
I am assuming you mean that the number in column 3 appears exactly twice in the file, not that it is the number 4 or 5. This solution makes 2 passes over your file to count the number of times each number occurs in column 3 the first time and to print matching lines the second time. Therefore the input file is specified twice on the command line.
As a note of explanation, it counts the number of times 1 occurs in column 3 in a[1], and it counts the number of times 2 occurs in column 3 in a[2] etc...
Reading your question title, it says "2 lines maximum", so if occurring in one single line is also ok, you should change the "==" in my code to "<=". I cannot tell what you mean.

Decrypt many PDFs in one go using pdftk

I have 10 PDFs that ask for a user password to open. I know that password. I want to keep them in a decrypted format. Their filenames follow the form:
static_part.dynamic_part_like_date.pdf
I want to convert all the 10 files. I can give a * after the static part and work on all of them, but I also want the corresponding output filenames. So there has to be a way to capture the dynamic part of the filename and then use it in the output filename.
The normal way of doing this for one file is:
pdftk secured.pdf input_pw foopass output unsecured.pdf
I want to do something like:
pdftk var=secured*.pdf input_pw foopass output unsecured+var.pdf
Thanks.
Your request is a little ambiguous, but here are some ideas that might help you.
Assuming 1 of your 10 files is
# static_part.dynamic_part_like_date.pdf
# SalesReport.20110416.pdf (YYYYMMDD)
And you want only the SalesReport.pdf converted as unsecured, you can use a shell script to achieve your requirement:
# make a file with the following contents,
# then make it executable with `chmod 755 pdfFixer.sh`
# the .../bin/bash has to be the first line the file.
$ cat pdfFixer.sh
#!/bin/bash
# call the script like $ pdfFixer.sh staticPart.*.pdf
# ( not '$' char in your command, that is the cmd-line prompt in this example,
# yours may look different )
# use a variable to hold the password you want to use
pw=foopass
for file in ${#} ; do
# %%.* strips off everything after the first '.' char
unsecuredName=${file%%.*}.pdf
#your example : pdftk secured.pdf input_pw foopass output unsecured.pdf
#converts to
pdftk ${file} input_pw ${foopass} output ${unsecuredName}.pdf
done
You may find that you need to modify the %.* thing to
strip less from end, (use %.*) to strip just the last '.' and all chars after (strip from right).
strip from the fron (use #*.) to just the static part, leaving the dynamic part OR
strip from the front (use ##*.) to strip everything until the last '.' char.
It will really be much easier for you to figure out what you need at the cmd-line.
Set a variable with 1 sample fileName
myTestFileName=staticPart.dynamicPart.pdf
and then use echo combined with the variable modifiers to see the results.
echo ${myTestFileName##*.}
echo ${myTestFileName#*.}
echo ${myTestFileName##.*}
echo ${myTestFileName#.*}
echo ${myTestFileName%%.*}
etc.
Also notice how I combine a modified variable value with a plain string (.pdf), in unsecuredName=${file%%.*}.pdf
IHTH

How can I insert a column in numeric comma separated input?

Hi i have as text file below
input
326783,326784,402
326783,0326784,402
503534,503535,403
503534,0503535,403
429759,429758,404
429759,0429758,404
409626,409627,405
409626,0409627,405
369917,369916,402
369917,0369916,403
i want to convert it like below
condition :
1)input file column 3 and column 1 should be be same for 326784 and 0326784 and like that so on
2)if it different like the above input file last case then it should be printed in last line
output should be
326783,326784,0326784,402
503534,503535,0503535,403
429759,429758,0429758,404
409626,409627,0409627,405
369917,369916,402
369917,0369916,403
i am using solaris platform
please help me
I don't understand the logic of your computation, but some general advice: the unix tool awk can do such computations. It understands comma-separated files and you can get it to output other comma-separated files, manipulated by your logic (which you'll have to express in awk syntax).
This is, as I understand it, the unix way to do it.
The way I'd do it (being a non-expert on awk and just mentioning it for completeness ;) would be to write a little python script.
you want to
open an input and an output file
get each line from the input file
parse the integers
perform your logic
write integers to your output file
unchecked python-like code:
f_in = open("input", "r")
f_out = open("output", "w")
for line in f_in.readlines():
ints = [int(x) for x in line.split(",")]
f_out.write("%d, %d, %d\n" % (ints[0], ints[1], ints[0]+ints[1]))
f_in.close()
f_out.close()
Here, the logic is in the f_out.write(...) line (this example would output the first, the second and the sum of both input integers)
You can check if you have a Python interpreter at hand by simply typing python and seeing what happens. If you have, save your code into something.py and start it with "python something.py"

Resources