Remove all text starting from a specific character - unix

I have a file setup like
TEXT1:TEXT2
Basically lines of text separated by a :
I would like all text to the right of the : gone,
so TEXT1:TEXT2 would turn into just TEXT1

Using cut
We tell cut that our field separator is a colon, -d:, and that we want to select the first field, -f1:
$ cut -d: -f1 file
TEXT1
Using sed
We tell sed to remove the first colon on a line and everything after:
$ sed 's/:.*//' file
TEXT1
Using grep
We tell grep to select the first part of each up to but not including the first colon:
$ grep -o '^[^:]*' file
TEXT1

awk -F: '{$0=$1}1' infile
TEXT1
make ":" as your delimiter and then set column1 as your record.

Below script
awk -v FS=":" '{print $1}' file
would also give you the same result.

In AWK, replace everything after the : with nothing:
$ awk 'sub(/:.*/,"",$0)' test
TEXT1

Using sed
sed -i.bkp 's/:.*//' infile.txt
This will also change the file inplace and create a backup file named infile.txt.bkp
Using grep
grep -oP '.*(?=:)' infile.txt

Related

Unix command to parse string

I'm trying to figure out a command to parse the following file content:
Operation=GET
Type=HOME
Counters=CacheHit=0,Exception=1,Validated=0
I need to extract Exception=1 into its own line. I'm fiddling with awk, sed and grep but not making much progress. Does anyone have any tips on using any unix command to perform this?
Thanks
Since your file is close to bash syntax, there is a fun little trick you can do to make bash itself parse the file. First, use some program like tr to transform the input into a something bash can parse, and then "source" that, which will create shell variables you can expand later to get the values.
source <(tr , $'\n' < file_name_goes_here)
echo $Exception
Many ways to do this. Here is one assuming the file is called "file.txt". Grab the line you want, replace everything from the start of the line up to Except with just Except, then pull out the first field using comma as the delimiter.
$ grep Exception file.txt | sed 's/.*Except/Except/g' | cut -d, -f 1
Exception=1
If you wanted to use gawk:
$ grep Exception file.txt | sed 's/.*Except/Except/g' | gawk -F, '{print $1}'
Exception=1
or just using grep and sed:
$ grep Exception file.txt | sed 's/.*\(Exception=[0-9]*\).*/\1/g'
Exception=1
or as #sheltter reminded me:
$ egrep -o "Exception=[0-9]+" file.txt
Exception=1
No need to use a mix of commands.
awk -F, 'NR==2 {print RS$1}' RS="Exception" file
Exception=1
Here we split the line by the keyword we look for RS="Exception"
If the line has two record (only when keyword is found), then
print first field, separated using command, with Record selector.
PS This only works if you have one Exception field

Remove last character from a file in unix

Have
08-01-12|07-30-13|08-09-32|12-43-56|
Want
08-01-12|07-30-13|08-09-32|12-43-56
I want to remove just the last |.
quick and dirty
sed 's/.$//' YourFile
a bit secure
sed 's/[|]$//' YourFile
allowing space
sed 's/[|][[:space:]]*$//' YourFile
same for only last char of last line (thansk #amelie for this comment) :
add a $in front so on quick and dirty it gives sed '$ s/.$//' YourFile
Since you tagged with awk, find here two approaches, one for every interpretation of your question:
If you want to remove | if it is the last character:
awk '{sub(/\|$/,"")}1' file
Equivalent to sed s'/|$//' file, only that escaping | because it has a special meaning in regex content ("or").
If you want to remove the last character, no matter what it is:
awk '{sub(/.$/,"")}1' file
Equivalent to sed s'/.$//' file, since . matches any character.
Test
$ cat a
08-01-12|07-30-13|08-09-32|12-43-56|
rrr.
$ awk '{sub(/\|$/,"")}1' a
08-01-12|07-30-13|08-09-32|12-43-56
rrr. # . is kept
$ awk '{sub(/.$/,"")}1' a
08-01-12|07-30-13|08-09-32|12-43-56
rrr # . is also removed

search and replace the value inside tags using script

I have a file like this. abc.txt
<ra><r>12.34</r><e>235</e><a>34.908</a><r>23</r><a>234.09</a><p>234</p><a>23</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>345</a><po>234</po><a>345</a></ra>
What I have to do is I have to find <ra> tag and for inside <ra> tag there is <a> tag whose valeus I have to replace by 0.00.
grep "<ra>" "abc.txt" | grep "<a>"
I am able to find and but I dont know how to change.
Output file for this:-
<ra><r>12.34</r><e>235</e><a>0.00</a><r>23</r><a>0.00</a><p>234</p><a>0.00</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>0.00</a><po>234</po><a>0.00</a></ra>
Replace using awk and gsub
awk '/^<ra>/ {gsub(/<a>[^<]*</,"<a>0.00<")}1' file
<ra><r>12.34</r><e>235</e><a>0.00</a><r>23</r><a>0.00</a><p>234</p><a>0.00</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>0.00</a><po>234</po><a>0.00</a></ra>
This sed should work:
sed -i.bak '/<ra>/s~\(<a>\)[^<]*\(</a>\)~\10.00\2~g' abc.txt
<ra><r>12.34</r><e>235</e><a>0.00</a><r>23</r><a>0.00</a><p>234</p><a>0.00</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>0.00</a><po>234</po><a>0.00</a></ra>
Because of -i (inline) switch this sed will save the changes in original file itself.
You can try with the following code:
$ sed -e '/<ra>/ s#<a>[^<]*<#<a>0.00<#g' file
<ra><r>12.34</r><e>235</e><a>0.00</a><r>23</r><a>0.00</a><p>234</p><a>0.00</a></ra>
<hello>sadfaf</hello>
<hi>hiisadf</hi>
<ra><s>asdf</s><qw>345</qw><a>0.00</a><po>234</po><a>0.00</a></ra>
It is based on this structure:
Print # in lines starting with BBB just if there was not ^# before
sed -e '/^BBB/ s/^#*/#/' -i file
changing the delimiter to a # so we do not need to escape the / in </a>.
Note that if you want the file to be updated you need to add -i to the sed (sed -i -e ...). Otherwise the result will be printed in the stdout.

How to remove blank lines from a Unix file

I need to remove all the blank lines from an input file and write into an output file. Here is my data as below.
11216,33,1032747,64310,1,0,0,1.878,0,0,0,1,1,1.087,5,1,1,18-JAN-13,000603221321
11216,33,1033196,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,059762153003
11216,33,1033246,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,000603211032
11216,33,1033280,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,055111034001
11216,33,1033287,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000378689701
11216,33,1033358,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000093737301
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041926
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041954
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049326
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049383
11216,33,1036985,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000093415580
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781202001
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781261305
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781603955
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781615746
sed -i '/^$/d' foo
This tells sed to delete every line matching the regex ^$ i.e. every empty line. The -i flag edits the file in-place, if your sed doesn't support that you can write the output to a temporary file and replace the original:
sed '/^$/d' foo > foo.tmp
mv foo.tmp foo
If you also want to remove lines consisting only of whitespace (not just empty lines) then use:
sed -i '/^[[:space:]]*$/d' foo
Edit: also remove whitespace at the end of lines, because apparently you've decided you need that too:
sed -i '/^[[:space:]]*$/d;s/[[:space:]]*$//' foo
awk 'NF' filename
awk 'NF > 0' filename
sed -i '/^$/d' filename
awk '!/^$/' filename
awk '/./' filename
The NF also removes lines containing only blanks or tabs, the regex /^$/ does not.
Use grep to match any line that has nothing between the start anchor (^) and the end anchor ($):
grep -v '^$' infile.txt > outfile.txt
If you want to remove lines with only whitespace, you can still use grep. I am using Perl regular expressions in this example, but here are other ways:
grep -P -v '^\s*$' infile.txt > outfile.txt
or, without Perl regular expressions:
grep -v '^[[:space:]]*$' infile.txt > outfile.txt
sed -e '/^ *$/d' input > output
Deletes all lines which consist only of blanks (or is completely empty). You can change the blank to [ \t] where the \t is a representation for tab. Whether your shell or your sed will do the expansion varies, but you can probably type the tab character directly. And if you're using GNU or BSD sed, you can do the edit in-place, if that's what you want, with the -i option.
If I execute the above command still I have blank lines in my output file. What could be the reason?
There could be several reasons. It might be that you don't have blank lines but you have lots of spaces at the end of a line so it looks like you have blank lines when you cat the file to the screen. If that's the problem, then:
sed -e 's/ *$//' -e '/^ *$/d' input > output
The new regex removes repeated blanks at the end of the line; see previous discussion for blanks or tabs.
Another possibility is that your data file came from Windows and has CRLF line endings. Unix sees the carriage return at the end of the line; it isn't a blank, so the line is not removed. There are multiple ways to deal with that. A reliable one is tr to delete (-d) character code octal 15, aka control-M or \r or carriage return:
tr -d '\015' < input | sed -e 's/ *$//' -e '/^ *$/d' > output
If neither of those works, then you need to show a hex dump or octal dump (od -c) of the first two lines of the file, so we can see what we're up against:
head -n 2 input | od -c
Judging from the comments that sed -i does not work for you, you are not working on Linux or Mac OS X or BSD — which platform are you working on? (AIX, Solaris, HP-UX spring to mind as relatively plausible possibilities, but there are plenty of other less plausible ones too.)
You can try the POSIX named character classes such as sed -e '/^[[:space:]]*$/d'; it will probably work, but is not guaranteed. You can try it with:
echo "Hello World" | sed 's/[[:space:]][[:space:]]*/ /'
If it works, there'll be three spaces between the 'Hello' and the 'World'. If not, you'll probably get an error from sed. That might save you grief over getting tabs typed on the command line.
grep . file
grep looks at your file line-by-line; the dot . matches anything except a newline character. The output from grep is therefore all the lines that consist of something other than a single newline.
with awk
awk 'NF > 0' filename
To be thorough and remove lines even if they include spaces or tabs something like this in perl will do it:
cat file.txt | perl -lane "print if /\S/"
Of course there are the awk and sed equivalents. Best not to assume the lines are totally blank as ^$ would do.
Cheers
You can sed's -i option to edit in-place without using temporary file:
sed -i '/^$/d' file

find and replace from command line unix

I have a multi line text file where each line has the format
..... Game #29832: ......
I want to append the character '1' to each number on each line (which is different on every line), does anyone know of a way to do this from the command line?
Thanks
sed -i -e 's/Game #[0-9]*/&1/' file
-i is for in-place editing, and & means whatever matched from the pattern. If you don't want to overwrite the file, omit the -i flag.
Using sed:
cat file | sed -e 's/\(Game #[0-9]*\)/\11/'
sed 's/ Game #\([0-9]*\):/ Game #1\1:/' yourfile.txt
GNU awk
awk '{b=gensub(/(Game #[0-9]+)/ ,"\\11","g",$0); print b }' file

Resources