Replace all spaces in a certain part UNIX - unix

Hi all I'm trying to replace all spaces beginning in certain part of my file. I tried to do it but I can't make it to start in a certain part.
i tried this sed "s/\s/_/g" < file.txt > file_1.txt but all of the spaces turn into underscore.
inside file.txt :
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi ba Inu
Sibe rian Husky
Labra dor Retriever
Ger man Shep herd
Bull Doge
Be agle
chi hua hua
Bull Ter rier
expected file_1.txt:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_I_nu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier

If you want the substitution to happen only after "List of Dogs", try
sed -e '1,/List of Dogs:/b' -e 's/\s/_/g'
The command b means "branch" (to the end of the script, i.e. bypass the substitution) and the address range specifies this action for the first line through the first line matching the regex.

If you want the substitution happen only after the :, use something like this:
sed -r '/:/,$ s/\s/_/g;' file.txt > file_1.txt
The substitution is restricted from a line containing : until the end of the file $.

Given your initial input file.txt:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi ba Inu
Sibe rian Husky
Labra dor Retriever
Ger man Shep herd
Bull Doge
Be agle
chi hua hua
Bull Ter rier
You can try this:
$ sed '/List of Dogs/,$s/\s/_/g;s/List_of_Dogs/List of Dogs/g' file.txt
Which results:
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_Inu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier
Explanation
sed commands can be split by ;
first part starts with getting an address, which is the form range start,range end. Finds the line that List of Dogs starts at. And $ specifies last line of file, for the range end part of this syntax
so just for this address range, your search and replace command is done: $s/\s/_/g
but unfortunately the command also replaced and resulted in List_of_Dogs: so second command s/List_of_Dogs/List of Dogs/g is just a workaround to convert it back

You have the answer and you don't know it =)
You say you want to replace the spaces, but you have not said what you want to replace them with. I suspect, you want to replace them with a no-space character, right?
sed "s/ //g" $original_file > $new_file
or referencing the space with \s the following should also work
sed "s/\s//g" $original_file > $new_file
The syntax is basically
sed "s/find_this/replace_with/g" $original_file > $new_file
I hope that helps...

keep it simple, obvious, robust, portable, etc. and just use awk:
$ awk 'found{gsub(/[[:space:]]/,"_")} /:/{found=1} {print}' file
My Name
Favorite Food
Favorite Color
Time is gold
List of Dogs:
Shi_ba_Inu
Sibe_rian_Husky
Labra_dor_Retriever
Ger_man_Shep_herd
Bull_Doge
Be_agle
chi_hua_hua
Bull_Ter_rier

Related

How can I remove digits after a tab?

I have two columns in my file.
Example: The first column has movie titles and the second column has its ratings.
Planet51 48
Avengers 97
Aladdin 61
I want to remove the ratings from the file and just have the column containing movie titles using sed command. I am using the command $sed 's/[0-9]//g' input > output.
However, this removes all digits in the file, so my output is
Planet
Avengers
Aladdin
instead of
Planet51
Avengers
Aladdin
How can I fix my sed command so that it will only remove digits after the tab space? I tried messing around with some metacharacters (specifically \t)but I just confused myself.
If you have the GNU coreutils installed:
$ cut -f1 file.txt
Planet51
Avengers
Aladdin
You can just do
sed 's/\t[0-9]*//' input > output
or indeed just
sed 's/\t.*//' input > output
to delete everything after the tab.
(Tested with GNU sed; the \t doesn't seem to be guaranteed by POSIX. In a script it might be more portable to let the command contain a literal tab character instead of \t).
Why not just print first column.
awk '{print $1}' file
Planet51
Avengers
Aladdin

How to replace every second occurence of a word in a text file

In a file called sample.txt, I have the following text:
Once there is a tortoise and a rabbit. The rabbit was fast, tortoise was slow.
Rabbit used to mock the tortoise. Once the rabbit challenged the tortoise for a race.
Tortoise accepted rabbit’s request. Rabbit was overconfident.
Rabbit thought to win the race. Rabbit ran fast. Then rabbit got tired. Rabbit wanted to take rest. So rabbit slept under the tree.
Tortoise kept going and won the race.
How to replace every second occurrence of rabbit to hare using Unix commands?
When the input is one line (or you are happy to count from 1 at the beginning of each line), and want to ignore the uppercase Rabbit, you can use this solution:
First replace all rabbits by one character that sed can match.
Replace the second rabbit-characters and restore the other rabbits.
sed -r 's/rabbit/\r/g; s/(\r[^\r]*)\r/\1hare/g; s/\r/rabbit/g' sample.txt
Edit, Additional explanation:
When the input file is a clean unix-style file (no MS-DOS endings \r\n), we know that the character \r is unique. After sed -r 's/rabbit/\r/g each rabbit is represented by \r (the letter r actually isn't short the first letter of rabbit but the first of return).
Now you want to look for sequences <rabbit><not-a-rabbit><rabbit>, in our new notation that is the sequence \r[^\r]*\r, where [^\r]* stands for any sequence of characters without the rabbit character.
When we found 2 rabbits, we want to remember the first rabbit with the non-rabbit characters. In sed you can remember a matched sequence with \(..\), or use the option -r and (..). You can recall the first memory location (we only have one here) with \1, in this case the first rabbit \r and the non-rabbit characters. The second rabbit \r is replaced by hare.
After replacing the second \r (global on the line, so every second one), we want to transform the \r rabbits into the string rabbit.
More possibilities
When your inputfile has more than 1 line, you might want something different. With one rabbit on the first and one rabbit on the second line, how can you catch the second rabbit? Before performing the above sed command, you need to transpose your input file to 1 line. Afterwards you want to restore the line-endings, so you will need to replace the line-endings with a special character. Normally I would use the \r for this, but that character is reserved for the rabbits. The character \v is possible to, resulting in
tr '\n' '\v' < sample.txt |
sed -r 's/rabbit/\r/g; s/(\r[^\r]*)\r/\1hare/g; s/\r/rabbit/g' |
tr '\v' '\n'
When you also want to replace uppercase Rabbits, we can transpose those Rabbits in \a.
You can ask for any rabbit (large or small) with [\r\a], what will make the command one level more complex:
tr '\n' '\v' < sample.txt |
sed -r 's/rabbit/\r/g; s/Rabbit/\a/g;
s/([\r\a][^\r\a]*)[\r\a]/\1hare/g;
s/\r/rabbit/g; s/\a/Rabbit/g' |
tr '\v' '\n'
When you want to replace the uppercase Rabbit \a with an uppercase Hare, the command will get even more complex (you need another special character).
I want to use the \x01 for marking a [Rr]abbit to be changed.
tr '\n' '\v' < sample.txt |
sed -r 's/rabbit/\r/g;
s/Rabbit/\a/g;
s/([\r\a][^\r\a]*)([\r\a])/\1\x01\2/g;
s/\x01\r/hare/g;
s/\x01\a/Hare/g;
s/\r/rabbit/g; s/\a/Rabbit/g' |
tr '\v' '\n'
$ sed 's/[Rr]abbit/hare/2' sample.txt

grep: how to show the next lines after the matched one until a blank line [not possible!]

I have a dictionary (not python dict) consisting of many text files like this:
##Berlin
-capital of Germany
-3.5 million inhabitants
##Earth
-planet
How can I show one entry of the dictionary with the facts?
Thank you!
You can't. grep doesn't have a way of showing a variable amount of context. You can use -A to show a set number of lines after the match, such as -A3 to show three lines after a match, but it can't be a variable number of lines.
You could write a quick Perl program to read from the file in "paragraph mode" and then print blocks that match a regular expression.
as andy lester pointed out, you can't have grep show a variable amount of context in grep, but a short awk statement might do what you're hoping for.
if your example file were named file.dict:
awk -v term="earth" 'BEGIN{IGNORECASE=1}{if($0 ~ "##"term){loop=1} if($0 ~ /^$/){loop=0} if(loop == 1){print $0}}' *.dict
returns:
##Earth
-planet
just change the variable term to the entry you're looking for.
assuming two things:
dictionary files have same extension (.dict for example purposes)
dictionary files are all in same directory (where command is called)
If your grep supports perl regular expressions, you can do it like this:
grep -iPzo '(?s)##Berlin.*?\n(\n|$)'
See this answer for more on this pattern.
You could also do it with GNU sed like this:
query=berlin
sed -n "/$query/I"'{ :a; $p; N; /\n$/!ba; p; }'
That is, when case-insensitive $query is found, print until an empty line is found (/\n$/) or the end of file ($p).
Output in both cases (minor difference in whitespace):
##Berlin
-capital of Germany
-3.5 million inhabitants

Skipping the first n lines when using regex with sed?

In sed, is it possible to skip the first n lines when applying a regex? I am currently using the following:
cat test | sed '/^Name/d;/^----------/1;/^(/d;/^$/d'
on the following file:
Name
John
Albert
Mora
Name
Tommy
Tammy
In one pass, I want to use some regexes (one of which is to remove the line containing Name but I want to skip the first line in this case) to obtain the following:
Name
John
Albert
Mora
Tommy
Tammy
Because the file is huge, I don't want to make multiple passes so any one-pass approaches would be great.
Yes, you can apply sed commands to ranges of lines with the N,M syntax. In this case you want something like this:
sed -e '2,$s/foo/bar/'
An example with delete:
sed -e '2,${ /^Name/d }'

excluding first and last lines from sed /START/,/END/

Consider the input:
=sec1=
some-line
some-other-line
foo
bar=baz
=sec2=
c=baz
If I wish to process only =sec1= I can for example comment out the section by:
sed -e '/=sec1=/,/=[a-z]*=/s:^:#:' < input
... well, almost.
This will comment the lines including "=sec1=" and "=sec2=" lines, and the result will be something like:
#=sec1=
#some-line
#some-other-line
#
#foo
#bar=baz
#
#=sec2=
c=baz
My question is: What is the easiest way to exclude the start and end lines from a /START/,/END/ range in sed?
I know that for many cases refinement of the "s:::" claws can give solution in this specific case, but I am after the generic solution here.
In "Sed - An Introduction and Tutorial" Bruce Barnett writes: "I will show you later how to restrict a command up to, but not including the line containing the specified pattern.", but I was not able to find where he actually show this.
In the "USEFUL ONE-LINE SCRIPTS FOR SED" Compiled by Eric Pement, I could find only the inclusive example:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
This should do the trick:
sed -e '/=sec1=/,/=sec2=/ { /=sec1=/b; /=sec2=/b; s/^/#/ }' < input
This matches between sec1 and sec2 inclusively and then just skips the first and last line with the b command. This leaves the desired lines between sec1 and sec2 (exclusive), and the s command adds the comment sign.
Unfortunately, you do need to repeat the regexps for matching the delimiters. As far as I know there's no better way to do this. At least you can keep the regexps clean, even though they're used twice.
This is adapted from the SED FAQ: How do I address all the lines between RE1 and RE2, excluding the lines themselves?
If you're not interested in lines outside of the range, but just want the non-inclusive variant of the Iowa/Montana example from the question (which is what brought me here), you can write the "except for the first and last matching lines" clause easily enough with a second sed:
sed -n '/PATTERN1/,/PATTERN2/p' < input | sed '1d;$d'
Personally, I find this slightly clearer (albeit slower on large files) than the equivalent
sed -n '1,/PATTERN1/d;/PATTERN2/q;p' < input
Another way would be
sed '/begin/,/end/ {
/begin/n
/end/ !p
}'
/begin/n -> skip over the line that has the "begin" pattern
/end/ !p -> print all lines that don't have the "end" pattern
Taken from Bruce Barnett's sed tutorial http://www.grymoire.com/Unix/Sed.html#toc-uh-35a
I've used:
sed '/begin/,/end/{/begin\|end/!p}'
This will search all the lines between the patterns, then print everything not containing the patterns
you could also use awk
awk '/sec1/{f=1;print;next}f && !/sec2/{ $0="#"$0}/sec2/{f=0}1' file

Resources