Unix command to parse string - unix

I'm trying to figure out a command to parse the following file content:
Operation=GET
Type=HOME
Counters=CacheHit=0,Exception=1,Validated=0
I need to extract Exception=1 into its own line. I'm fiddling with awk, sed and grep but not making much progress. Does anyone have any tips on using any unix command to perform this?
Thanks

Since your file is close to bash syntax, there is a fun little trick you can do to make bash itself parse the file. First, use some program like tr to transform the input into a something bash can parse, and then "source" that, which will create shell variables you can expand later to get the values.
source <(tr , $'\n' < file_name_goes_here)
echo $Exception

Many ways to do this. Here is one assuming the file is called "file.txt". Grab the line you want, replace everything from the start of the line up to Except with just Except, then pull out the first field using comma as the delimiter.
$ grep Exception file.txt | sed 's/.*Except/Except/g' | cut -d, -f 1
Exception=1
If you wanted to use gawk:
$ grep Exception file.txt | sed 's/.*Except/Except/g' | gawk -F, '{print $1}'
Exception=1
or just using grep and sed:
$ grep Exception file.txt | sed 's/.*\(Exception=[0-9]*\).*/\1/g'
Exception=1
or as #sheltter reminded me:
$ egrep -o "Exception=[0-9]+" file.txt
Exception=1

No need to use a mix of commands.
awk -F, 'NR==2 {print RS$1}' RS="Exception" file
Exception=1
Here we split the line by the keyword we look for RS="Exception"
If the line has two record (only when keyword is found), then
print first field, separated using command, with Record selector.
PS This only works if you have one Exception field

Related

using sed -n with variables

I am having a log file a.log and i need to extract a piece of information from it.
To locate the start and end line numbers of the pattern i am using the following.
start=$(sed -n '/1112/=' file9 | head -1)
end=$(sed -n '/true/=' file9 | head -1)
i need to use the variables (start,end) in the following command:
sed -n '16q;12,15p' orig-data-file > new-file
so that the above command appears something like:
sed -n '($end+1)q;$start,$end'p orig-data-file > new-file
I am unable to replace the line numbers with the variables. Please suggest the correct syntax.
Thanks,
Rosy
When I realized how to do it, I was looking for anyway to get line number into a file containing the requested info, and display the file from that line to EOF.
So, this was my way.
with
PATTERN="pattern"
INPUT_FILE="file1"
OUTPUT_FILE="file2"
line number of first match of $PATTERN into $INPUT_FILE can be retrieved with
LINE=`grep -n ${PATTERN} ${INPUT_FILE} | awk -F':' '{ print $1 }' | head -n 1`
and the outfile will be the text from that $LINE to EOF. This way:
sed -n ${LINE},\$p ${INPUT_FILE} > ${OUTPUT_FILE}
The point here, is the way how can variables be used with command sed -n:
first witout using variables
sed -n 'N,$p' <file name>
using variables
LINE=<N>; sed -n ${LINE},\$p <file name>
Remove the single quotes thus. Single quotes turn off the shell parsing of the string. You need shell parsing to do the variable string replacements.
sed -n '('$end'+1)q;'$start','$end''p orig-data-file > new-file

How to remove blank lines from a Unix file

I need to remove all the blank lines from an input file and write into an output file. Here is my data as below.
11216,33,1032747,64310,1,0,0,1.878,0,0,0,1,1,1.087,5,1,1,18-JAN-13,000603221321
11216,33,1033196,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,059762153003
11216,33,1033246,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,000603211032
11216,33,1033280,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,055111034001
11216,33,1033287,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000378689701
11216,33,1033358,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000093737301
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041926
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041954
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049326
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049383
11216,33,1036985,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000093415580
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781202001
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781261305
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781603955
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781615746
sed -i '/^$/d' foo
This tells sed to delete every line matching the regex ^$ i.e. every empty line. The -i flag edits the file in-place, if your sed doesn't support that you can write the output to a temporary file and replace the original:
sed '/^$/d' foo > foo.tmp
mv foo.tmp foo
If you also want to remove lines consisting only of whitespace (not just empty lines) then use:
sed -i '/^[[:space:]]*$/d' foo
Edit: also remove whitespace at the end of lines, because apparently you've decided you need that too:
sed -i '/^[[:space:]]*$/d;s/[[:space:]]*$//' foo
awk 'NF' filename
awk 'NF > 0' filename
sed -i '/^$/d' filename
awk '!/^$/' filename
awk '/./' filename
The NF also removes lines containing only blanks or tabs, the regex /^$/ does not.
Use grep to match any line that has nothing between the start anchor (^) and the end anchor ($):
grep -v '^$' infile.txt > outfile.txt
If you want to remove lines with only whitespace, you can still use grep. I am using Perl regular expressions in this example, but here are other ways:
grep -P -v '^\s*$' infile.txt > outfile.txt
or, without Perl regular expressions:
grep -v '^[[:space:]]*$' infile.txt > outfile.txt
sed -e '/^ *$/d' input > output
Deletes all lines which consist only of blanks (or is completely empty). You can change the blank to [ \t] where the \t is a representation for tab. Whether your shell or your sed will do the expansion varies, but you can probably type the tab character directly. And if you're using GNU or BSD sed, you can do the edit in-place, if that's what you want, with the -i option.
If I execute the above command still I have blank lines in my output file. What could be the reason?
There could be several reasons. It might be that you don't have blank lines but you have lots of spaces at the end of a line so it looks like you have blank lines when you cat the file to the screen. If that's the problem, then:
sed -e 's/ *$//' -e '/^ *$/d' input > output
The new regex removes repeated blanks at the end of the line; see previous discussion for blanks or tabs.
Another possibility is that your data file came from Windows and has CRLF line endings. Unix sees the carriage return at the end of the line; it isn't a blank, so the line is not removed. There are multiple ways to deal with that. A reliable one is tr to delete (-d) character code octal 15, aka control-M or \r or carriage return:
tr -d '\015' < input | sed -e 's/ *$//' -e '/^ *$/d' > output
If neither of those works, then you need to show a hex dump or octal dump (od -c) of the first two lines of the file, so we can see what we're up against:
head -n 2 input | od -c
Judging from the comments that sed -i does not work for you, you are not working on Linux or Mac OS X or BSD — which platform are you working on? (AIX, Solaris, HP-UX spring to mind as relatively plausible possibilities, but there are plenty of other less plausible ones too.)
You can try the POSIX named character classes such as sed -e '/^[[:space:]]*$/d'; it will probably work, but is not guaranteed. You can try it with:
echo "Hello World" | sed 's/[[:space:]][[:space:]]*/ /'
If it works, there'll be three spaces between the 'Hello' and the 'World'. If not, you'll probably get an error from sed. That might save you grief over getting tabs typed on the command line.
grep . file
grep looks at your file line-by-line; the dot . matches anything except a newline character. The output from grep is therefore all the lines that consist of something other than a single newline.
with awk
awk 'NF > 0' filename
To be thorough and remove lines even if they include spaces or tabs something like this in perl will do it:
cat file.txt | perl -lane "print if /\S/"
Of course there are the awk and sed equivalents. Best not to assume the lines are totally blank as ^$ would do.
Cheers
You can sed's -i option to edit in-place without using temporary file:
sed -i '/^$/d' file

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

In-place processing with grep

I've got a script that calls grep to process a text file. Currently I am doing something like this.
$ grep 'SomeRegEx' myfile.txt > myfile.txt.temp
$ mv myfile.txt.temp myfile.txt
I'm wondering if there is any way to do in-place processing, as in store the results to the same original file without having to create a temporary file and then replace the original with the temp file when processing is done.
Of course I welcome comments as to why this should or should not be done, but I'm mainly interested in whether it can be done. In this example I'm using grep, but I'm interested about Unix tools in general. Thanks!
sponge (in moreutils package in Debian/Ubuntu) reads input till EOF and writes it into file, so you can grep file and write it back to itself.
Like this:
grep 'pattern' file | sponge file
Perl has the -i switch, so does sed and Ruby
sed -i.bak -n '/SomeRegex/p' file
ruby -i.bak -ne 'print if /SomeRegex/' file
But note that all it ever does is creating "temp" files at the back end which you think you don't see, that's all.
Other ways, besides grep
awk
awk '/someRegex/' file > t && mv t file
bash
while read -r line;do case "$line" in *someregex*) echo "$line";;esac;done <file > t && mv t file
No, in general it can't be done in Unix like this. You can only create/truncate (with >) or append to a file (with >>). Once truncated, the old contents would be lost.
In general, this can't be done. But Perl has the -i switch:
perl -i -ne 'print if /SomeRegEx/' myfile.txt
Writing -i.bak will cause the original to be saved in myfile.txt.bak.
(Of course internally, Perl just does basically what you're already doing -- there's no special magic involved.)
To edit file in-place using vim-way, try:
$ ex -s +'%!grep foo' -cxa myfile.txt
Alternatively use sed or gawk.
Most installations of sed can do in-place editing, check the man page, you probably want the -i flag.
Store in a variable and then assign it to the original file:
A=$(cat aux.log | grep 'Something') && echo "${A}" > aux.log
Take a look at my slides "Field Guide To the Perl Command-Line Options" at http://petdance.com/perl/command-line-options.pdf for more ideas on what you can do in place with Perl.
cat myfile.txt | grep 'sometext' > myfile.txt
This will find sometext in myfile.txt and save it back to myfile.txt, this will accomplish what you want. Not sure about regex, but it does work for text.

find and replace from command line unix

I have a multi line text file where each line has the format
..... Game #29832: ......
I want to append the character '1' to each number on each line (which is different on every line), does anyone know of a way to do this from the command line?
Thanks
sed -i -e 's/Game #[0-9]*/&1/' file
-i is for in-place editing, and & means whatever matched from the pattern. If you don't want to overwrite the file, omit the -i flag.
Using sed:
cat file | sed -e 's/\(Game #[0-9]*\)/\11/'
sed 's/ Game #\([0-9]*\):/ Game #1\1:/' yourfile.txt
GNU awk
awk '{b=gensub(/(Game #[0-9]+)/ ,"\\11","g",$0); print b }' file

Resources