Remove all lines except matching pattern line best practice (sed)

Remove all lines except matching pattern line best practice (sed) - unix

I want to remove all lines except the line(s) containing matching pattern.
This is how I did it:
sed -n 's/matchingpattern/matchingpattern/p' file.txt
But I'm just curious because I rename matching pattern to the matching pattern itself. Looks like a waste here.
Is there a better way to do this?

sed '/pattern/!d' file.txt
But you're reinventing grep here.

grep is certainly better...because it's much faster.
e.g. using grep to extract all genome sequence data for chromosome 6 in a data set I'm working with:
$ time grep chr6 seq_file.in > temp.out
real 0m11.902s
user 0m9.564s
sys 0m1.912s
compared to sed:
$ time sed '/chr6/!d' seq_file.in > temp.out
real 0m21.217s
user 0m18.920s
sys 0m1.860s
I repeated it 3X and ~same values each time.

This might work for you:
sed -n '/matchingpattern/p' file.txt
/.../ is an address which may have actions attached in this case p.

Instead of using sed, which is complicated, use grep.
grep matching_pattern file
This should give you the desired result.

Related

Removing comments from a datafile. What are the differences?

Let's say that you would like to remove comments from a datafile using one of two methods:
cat file.dat | sed -e "s/\#.*//"
cat file.dat | grep -v "#"
How do these individual methods work, and what is the difference between them? Would it also be possible for a person to write the clean data to a new file, while avoiding any possible warnings or error messages to end up in that datafile? If so, how would you go about doing this?

How do these individual methods work, and what is the difference
between them?
Yes, they work same though sed and grep are 2 different commands. Your sed command simply substitutes all those lines which having # with NULL. On other hand grep will simply skip or ignore those lines which will skip lines which have # in it.
You could get more information on these by man page as follows:
man grep:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The
replacement may
contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1
through \9 to
refer to the corresponding matching sub-expressions in the regexp.
Would it also be possible for a person to write the clean data to a
new file, while avoiding any possible warnings or error messages to
end up in that datafile?
yes, we could re-direct the errors by using 2>/dev/null in both the commands.
If so, how would you go about doing this?
You could try like 2>/dev/null 1>output_file
Explanation of sed command: Adding explanation of sed command too now. This is only for understanding purposes and no need to use cat and then use sed you could use sed -e "s/\#.*//" Input_file instead.
sed -e " ##Initiating sed command here with adding the script to the commands to be executed
s/ ##using s for substitution of regexp following it.
\#.* ##telling sed to match a line if it has # till everything here.
//" ##If match found for above regexp then substitute it with NULL.

That grep -v will lose all the lines that have # on them, for example:
$ cat file
first
# second
thi # rd
so
$ grep -v "#" file
first
will drop off all lines with # on it which is not favorable. Rather you should:
$ grep -o "^[^#]*" file
first
thi
like that sed command does but this way you won't get empty lines. man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

How to find a pattern using sed?

How can I combine multiple filters using sed?
Here's my data set
sex,city,age
male,london,32
male,manchester,32
male,oxford,64
female,oxford,23
female,london,33
male,oxford,45
I want to identify all lines which contain MALE AND OXFORD. Here's my approach:
sed -n '/male/,/oxford/p' file
Thanks

You can associate a block with the first check and put the second in there. For example:
sed -n '/male/ { /oxford/ p; }' file
Or invert the check and action:
sed '/male/!d; /oxford/!d' file
However, since (as #Jotne points out) lines that contain female also contain male and you probably don't want to match them, the patterns should at least be amended to contain word boundaries:
sed -n '/\<male\>/ { /\<oxford\>/ p; }' file
sed '/\<male\>/!d; /\<oxford\>/!d' file
But since that looks like comma-separated data and the check is probably not meant to test whether someone went to male university, it would probably be best to use a stricter check with awk:
awk -F, '$1 == "male" && $2 == "oxford"' file
This checks not only if a line contains male and oxford but also if they are in the appropriate fields. The same can be achieved, somewhat less prettily, with sed by using
sed '/^male,oxford,/!d' file

A single sed command command can be used to solve this. Let's look at two variations of using sed:
$ sed -e 's/^\(male,oxford,.*\)$/\1/;t;d' file
male,oxford,64
male,oxford,45
$ sed -e 's/^male,oxford,\(.*\)$/\1/;t;d' file
64
45
Both have the essentially the same regex:
^male,oxford,.*$
The interesting features are the capture group placement (either the whole line or just the age portion) and the use of ;t;d to discard non matching lines.
By doing it this way, we can avoid the requirement of using awk or grep to solve this problem.

You can use awk
awk -F, '/\<male\>/ && /\<oxford\>/' file
male,oxford,64
male,oxford,45
It uses the word anchor to prevent hit on female.

change time format using sed

I've come across a bunch of files I need to import into my database with an awful time format
A09:13:08C
not even sure what it stands for
Is there any fast way using sed to replace 'A' by space and delete 'C'?

sed -r 's/A(.*)C/ \1/' filename
Simply you are saving all the words between A and C and then using it with \1
A more careful sentence would be:
sed -r 's/A([0-9:]+)C/ \1/'

Presumably, there is other data on the line, so using a casual .* is likely to mangle things. I'd use a rather verbose but restrictive pattern:
sed -e 's/A\([012][0-9]:[0-5][0-9]:[0-5][0-9]\)C/ \1/'
This looks for an A followed by a 24-hour clock time value and C, preserving the time portion. It would accept some invalid times (25-29 as the hour; indeed, 24:00:01 is not normally valid either, but 24:00:00 can be); it would be your judgement call whether it is worth refining these patterns (frankly, I doubt it, but it depends on how well you know your data).

If this is all that is in the file then
grep -o '[^AC]\+' file
If there are other fields i would use (g)awk.
Where N is the field.
awk '{match($1,/([^AC]+)/,x)}$1=x[1]' file

This looks much simplier:
tr A ' ' | tr -d C

modifiy grep output with find and replace cmd

I use grep to sort log big file into small one but still there is long dir path in output log file which is common every time.I have to do find and replace every time.
Isnt there any way i can grep -r "format" log.log | execute findnreplce thing?

Sed will do what you want. Basic syntax to replace all the matches of foo with bar in-place in $file is:
sed -i 's/foo/bar/g' $file
If you're just wanting to delete rather than replace, simply leave out the 'bar' (so s/foo//g).
See this tutorial for a lot more detail, such as regex support.

sed -n '/match/s/pattern/repl/p'
Will print all the lines that match the regex match, with all instances of pattern replaced by repl. Since your lines may contain paths, you will probably want to use a different delimeter. / is customary, but you can also do:
sed -n '\#match#s##repl#p`
In the second case, omitting pattern will cause match to be used for the pattern to be replaced.

excluding first and last lines from sed /START/,/END/

Consider the input:
=sec1=
some-line
some-other-line
foo
bar=baz
=sec2=
c=baz
If I wish to process only =sec1= I can for example comment out the section by:
sed -e '/=sec1=/,/=[a-z]*=/s:^:#:' < input
... well, almost.
This will comment the lines including "=sec1=" and "=sec2=" lines, and the result will be something like:
#=sec1=
#some-line
#some-other-line
#
#foo
#bar=baz
#
#=sec2=
c=baz
My question is: What is the easiest way to exclude the start and end lines from a /START/,/END/ range in sed?
I know that for many cases refinement of the "s:::" claws can give solution in this specific case, but I am after the generic solution here.
In "Sed - An Introduction and Tutorial" Bruce Barnett writes: "I will show you later how to restrict a command up to, but not including the line containing the specified pattern.", but I was not able to find where he actually show this.
In the "USEFUL ONE-LINE SCRIPTS FOR SED" Compiled by Eric Pement, I could find only the inclusive example:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive

This should do the trick:
sed -e '/=sec1=/,/=sec2=/ { /=sec1=/b; /=sec2=/b; s/^/#/ }' < input
This matches between sec1 and sec2 inclusively and then just skips the first and last line with the b command. This leaves the desired lines between sec1 and sec2 (exclusive), and the s command adds the comment sign.
Unfortunately, you do need to repeat the regexps for matching the delimiters. As far as I know there's no better way to do this. At least you can keep the regexps clean, even though they're used twice.
This is adapted from the SED FAQ: How do I address all the lines between RE1 and RE2, excluding the lines themselves?

If you're not interested in lines outside of the range, but just want the non-inclusive variant of the Iowa/Montana example from the question (which is what brought me here), you can write the "except for the first and last matching lines" clause easily enough with a second sed:
sed -n '/PATTERN1/,/PATTERN2/p' < input | sed '1d;$d'
Personally, I find this slightly clearer (albeit slower on large files) than the equivalent
sed -n '1,/PATTERN1/d;/PATTERN2/q;p' < input

Another way would be
sed '/begin/,/end/ {
/begin/n
/end/ !p
}'
/begin/n -> skip over the line that has the "begin" pattern
/end/ !p -> print all lines that don't have the "end" pattern
Taken from Bruce Barnett's sed tutorial http://www.grymoire.com/Unix/Sed.html#toc-uh-35a

I've used:
sed '/begin/,/end/{/begin\|end/!p}'
This will search all the lines between the patterns, then print everything not containing the patterns

you could also use awk
awk '/sec1/{f=1;print;next}f && !/sec2/{ $0="#"$0}/sec2/{f=0}1' file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex