In sed, is it possible to skip the first n lines when applying a regex? I am currently using the following:
cat test | sed '/^Name/d;/^----------/1;/^(/d;/^$/d'
on the following file:
Name
John
Albert
Mora
Name
Tommy
Tammy
In one pass, I want to use some regexes (one of which is to remove the line containing Name but I want to skip the first line in this case) to obtain the following:
Name
John
Albert
Mora
Tommy
Tammy
Because the file is huge, I don't want to make multiple passes so any one-pass approaches would be great.
Yes, you can apply sed commands to ranges of lines with the N,M syntax. In this case you want something like this:
sed -e '2,$s/foo/bar/'
An example with delete:
sed -e '2,${ /^Name/d }'
Related
Learning sed and patterns.
I have lines of input that look like this:
1000001,P00069042,F,0-17,10,A,2,0,3,,,8370
1000001,P00248942,F,0-17,10,A,2,0,1,6,14,15200
1000001,P00087842,F,0-17,10,A,2,0,12,,,1422
1000001,P00085442,F,0-17,10,A,2,0,12,14,,1057
1000002,P00285442,M,55+,16,C,4+,0,8,,,7969
1000003,P00193542,M,26-35,15,A,3,0,1,2,,15227
I need to extract the first, second, and last fields. The output for the first line would be something like
1000001 P00069042 8370
I have tried sed -n 's/,.*,.*,/ /p' but it only returns the first and last fields.
I have also tried sed -n 's/\([^,]*,[^,]*,\).*,/ /p' but it only returns the last field.
My approach is to delete everything between the second comma and the last comma, but I don't know how to specify the second comma.
I'm aware this can be done with cut or awk, but I'm trying to figure out sed.
sed is great for unstructured/raw stream data -- this isn't one of those.
Nonetheless, the trick to using sed to pick out "fields" is to:
Create a regex that matches the whole line
Use capture groups \(..\) to pick out the parts you want to save
Use the [^<c>]*<c> idiom to enforce non-greedy behavior where needed (<c> is any char)
Replace the entire line using back-references from the capture groups you saved
$ echo 1000001,P00069042,F,0-17,10,A,2,0,3,,,8370 |\
sed 's/^\([^,]*\),\([^,]*\),.*,\(.*\)$/\1 \2 \3/'
1000001 P00069042 8370
Hi all I'm trying to replace all spaces beginning in certain part of my file. I tried to do it but I can't make it to start in a certain part.
i tried this sed "s/\s/_/g" < file.txt > file_1.txt but all of the spaces turn into underscore.
inside file.txt :
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi ba Inu
Sibe rian Husky
Labra dor Retriever
Ger man Shep herd
Bull Doge
Be agle
chi hua hua
Bull Ter rier
expected file_1.txt:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_I_nu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier
If you want the substitution to happen only after "List of Dogs", try
sed -e '1,/List of Dogs:/b' -e 's/\s/_/g'
The command b means "branch" (to the end of the script, i.e. bypass the substitution) and the address range specifies this action for the first line through the first line matching the regex.
If you want the substitution happen only after the :, use something like this:
sed -r '/:/,$ s/\s/_/g;' file.txt > file_1.txt
The substitution is restricted from a line containing : until the end of the file $.
Given your initial input file.txt:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi ba Inu
Sibe rian Husky
Labra dor Retriever
Ger man Shep herd
Bull Doge
Be agle
chi hua hua
Bull Ter rier
You can try this:
$ sed '/List of Dogs/,$s/\s/_/g;s/List_of_Dogs/List of Dogs/g' file.txt
Which results:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_Inu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier
Explanation
sed commands can be split by ;
first part starts with getting an address, which is the form range start,range end. Finds the line that List of Dogs starts at. And $ specifies last line of file, for the range end part of this syntax
so just for this address range, your search and replace command is done: $s/\s/_/g
but unfortunately the command also replaced and resulted in List_of_Dogs: so second command s/List_of_Dogs/List of Dogs/g is just a workaround to convert it back
You have the answer and you don't know it =)
You say you want to replace the spaces, but you have not said what you want to replace them with. I suspect, you want to replace them with a no-space character, right?
sed "s/ //g" $original_file > $new_file
or referencing the space with \s the following should also work
sed "s/\s//g" $original_file > $new_file
The syntax is basically
sed "s/find_this/replace_with/g" $original_file > $new_file
I hope that helps...
keep it simple, obvious, robust, portable, etc. and just use awk:
$ awk 'found{gsub(/[[:space:]]/,"_")} /:/{found=1} {print}' file
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_Inu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier
I can find my lines with this pattern, but in some case the info is on the line after the match. How can I also get the line following my match line?
sed -n '/SQL3227W Record token/p' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
Firstly, this looks like a job for grep:
grep -A 1 'SQL3227W Record token' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
(-A 1 means to print an additional 1 line After the match).
Secondly, if you're using GNU sed, you can use a second address of +1 thus:
sed -n '/SQL3227W Record token/,+1p' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
Otherwise, (if you really must use non-Gnu sed), then each time you match, append the following line to your pattern space. Delete the first line, before continuing loop (in case the second line is also a match).
Untested code:
#!/bin/sed -nf
/SQL3227W Record token/{
N
P
D
}
sed is for simple substitutions on individual lines, that is all. For anything even slightly more interesting just use awk:
awk '/SQL3227W Record token/{c=2} c&&c--' file
See Printing with sed or awk a line following a matching pattern for other related idioms.
How can I combine multiple filters using sed?
Here's my data set
sex,city,age
male,london,32
male,manchester,32
male,oxford,64
female,oxford,23
female,london,33
male,oxford,45
I want to identify all lines which contain MALE AND OXFORD. Here's my approach:
sed -n '/male/,/oxford/p' file
Thanks
You can associate a block with the first check and put the second in there. For example:
sed -n '/male/ { /oxford/ p; }' file
Or invert the check and action:
sed '/male/!d; /oxford/!d' file
However, since (as #Jotne points out) lines that contain female also contain male and you probably don't want to match them, the patterns should at least be amended to contain word boundaries:
sed -n '/\<male\>/ { /\<oxford\>/ p; }' file
sed '/\<male\>/!d; /\<oxford\>/!d' file
But since that looks like comma-separated data and the check is probably not meant to test whether someone went to male university, it would probably be best to use a stricter check with awk:
awk -F, '$1 == "male" && $2 == "oxford"' file
This checks not only if a line contains male and oxford but also if they are in the appropriate fields. The same can be achieved, somewhat less prettily, with sed by using
sed '/^male,oxford,/!d' file
A single sed command command can be used to solve this. Let's look at two variations of using sed:
$ sed -e 's/^\(male,oxford,.*\)$/\1/;t;d' file
male,oxford,64
male,oxford,45
$ sed -e 's/^male,oxford,\(.*\)$/\1/;t;d' file
64
45
Both have the essentially the same regex:
^male,oxford,.*$
The interesting features are the capture group placement (either the whole line or just the age portion) and the use of ;t;d to discard non matching lines.
By doing it this way, we can avoid the requirement of using awk or grep to solve this problem.
You can use awk
awk -F, '/\<male\>/ && /\<oxford\>/' file
male,oxford,64
male,oxford,45
It uses the word anchor to prevent hit on female.
Consider the input:
=sec1=
some-line
some-other-line
foo
bar=baz
=sec2=
c=baz
If I wish to process only =sec1= I can for example comment out the section by:
sed -e '/=sec1=/,/=[a-z]*=/s:^:#:' < input
... well, almost.
This will comment the lines including "=sec1=" and "=sec2=" lines, and the result will be something like:
#=sec1=
#some-line
#some-other-line
#
#foo
#bar=baz
#
#=sec2=
c=baz
My question is: What is the easiest way to exclude the start and end lines from a /START/,/END/ range in sed?
I know that for many cases refinement of the "s:::" claws can give solution in this specific case, but I am after the generic solution here.
In "Sed - An Introduction and Tutorial" Bruce Barnett writes: "I will show you later how to restrict a command up to, but not including the line containing the specified pattern.", but I was not able to find where he actually show this.
In the "USEFUL ONE-LINE SCRIPTS FOR SED" Compiled by Eric Pement, I could find only the inclusive example:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
This should do the trick:
sed -e '/=sec1=/,/=sec2=/ { /=sec1=/b; /=sec2=/b; s/^/#/ }' < input
This matches between sec1 and sec2 inclusively and then just skips the first and last line with the b command. This leaves the desired lines between sec1 and sec2 (exclusive), and the s command adds the comment sign.
Unfortunately, you do need to repeat the regexps for matching the delimiters. As far as I know there's no better way to do this. At least you can keep the regexps clean, even though they're used twice.
This is adapted from the SED FAQ: How do I address all the lines between RE1 and RE2, excluding the lines themselves?
If you're not interested in lines outside of the range, but just want the non-inclusive variant of the Iowa/Montana example from the question (which is what brought me here), you can write the "except for the first and last matching lines" clause easily enough with a second sed:
sed -n '/PATTERN1/,/PATTERN2/p' < input | sed '1d;$d'
Personally, I find this slightly clearer (albeit slower on large files) than the equivalent
sed -n '1,/PATTERN1/d;/PATTERN2/q;p' < input
Another way would be
sed '/begin/,/end/ {
/begin/n
/end/ !p
}'
/begin/n -> skip over the line that has the "begin" pattern
/end/ !p -> print all lines that don't have the "end" pattern
Taken from Bruce Barnett's sed tutorial http://www.grymoire.com/Unix/Sed.html#toc-uh-35a
I've used:
sed '/begin/,/end/{/begin\|end/!p}'
This will search all the lines between the patterns, then print everything not containing the patterns
you could also use awk
awk '/sec1/{f=1;print;next}f && !/sec2/{ $0="#"$0}/sec2/{f=0}1' file