Scenario:
#Start1
text
text
text
#Start2
text
#Start3
text
text
As shown above, the blocks are not certain with the number of lines contained with it. Question is how to utilize unix command to remove one of the block say #Start2.
I tried
less <file_name> | sed '/Start2/.+1d' | less
yes it probably works for this case if I can be sure how many lines are tailing the #Start2
But what if the number of lines tailing it is not constant each and everytime #Start2 happens? I might fail to remove all of it or accidentally remove some content of other block.
I need a more sophisticated way of knowing it hits the end of the block. Is there any?
Thanks
Using awk you can do this:
awk '/^#/ {f=0} /^#Start2/ {f=1} !f;' file
#Start1
text
text
text
#Start3
text
text
This awk prints all line as long as f=0 or not true.
f starts with default empty same as 0
If line starts with # set f to 0, so print line.
If line starts with #Start2 set f to 1, do not print.
This will make awk prints all line and stop if it finds #Start2, then continue again if it finds #
Another way to do it:
awk '!/Start2/' RS=# ORS=# file
#Start1
text
text
text
#Start3
text
text
This tell that a record starts with #
If record contains Start2, then do not print it. All other would be printed.
PatStart="#Start2"
PatBlockEnd="^#"
sed -n "/${PatStart}/,/${PatBlockEnd}/ {
/${PatStart}/ {x;s/.*//;x;}
x;p;}" YourFile
Use the 2 Pat... var for setting your block border (End assume this is a new block starting on a NEW LINE, not the end of current block in this case).
Related
I'm trying to split a huge .txt file into multiples .txt files containing just one paragraph each.
Let me provide an example. I would need a text like this:
This is the first paragraph. It makes no sense because is just an example.
This a second paragraph, as meaningless as the previous one.
Saved as two independent .txt files containing the first paragraph (the first file) and the second paragraph (the second file).
The first file would have only: "This is the first paragraph. It makes no sense because is just an example."
And the second one: "This a second paragraph, as meaningless as the previous one."
And the same for the whole text. In the huge .txt file paragraphs are divided by one or several empty lines. Ideas?
Thank you very much!
I created a 3 paragraph example and am using your comment here to recreate what I think you're describing.
text <- "This is the first paragraph. It makes no sense because is just an example. Nothing makes sense and I'm trying to understand what I'm doing with life. This paragraph does not seem to end.
What are we doing here.
This a second paragraph, as meaningless as the previous one.
There's too much to do - this is meaningless though.
Wow, that's funny."
paras <- unlist(strsplit(text, "\n\n"))
for (i in 1:length(paras)) {
write.table(paras[i], file = paste0("paragraph", i, ".txt"), row.names = F)
}
This code first assigns the value to the variable text and is followed bu the use of the strsplit function with the argument "\n\n" to split the text at each double newline character.
Then, a for loop is used to go through each element and save it into a separate .txt file.
I have refactored a man page's paragraph so that each sentence is it's own line. When rendering with man ./somefile.3 The output is slightly different.
Let me show an example:
This is line 1. This is line 2.
vs.
This is line 1.
This is line 2.
Are rendering like so:
First:
This is line 1. This is line 2.
Second:
This is line 1. This is line 2.
There is an extra space between the sentences. Note that I have made sure that there is no extra white space. I have more experience with Latex, asciidoc, and markdown and I can control that there, is it possible with troff/groff? I'd like to avoid that if possible. I don't think it should be there.
The troff input standard is to have a newline at the end of each sentence, and to let the typesetter do its job with filling. (Althought I doubt it was the intent, it does make it play nicer with source control.) Therefore, it considers sentence ends to be at the end of a line that ends with a period (or ? or !, and optionally followed by ',",*,],),or †). It also believes that sentences should have two spaces between them. This almost certainly derives from the typography standards at Bell Labs at the time; It's rather curious that this behavior is not settable through any fill modes.
groff does provide a way to set the "inter-sentence" spacing, with the extended .ss request:
.ss word_space_size [sentence_space_size]
Change the size of a space between words. It takes its units as one
twelfth of the space width parameter for the current font. Initially
both the word_space_size and sentence_space_size are 12. In fill mode,
the values specify the minimum distance.
If two arguments are given to the ss request, the second argument sets
the sentence space size. If the second argument is not given, sentence
space size is set to word_space_size. The sentence space size is used
in two circumstances: If the end of a sentence occurs at the end of a
line in fill mode, then both an inter-word space and a sentence space
are added; if two spaces follow the end of a sentence in the middle of
a line, then the second space is a sentence space. If a second
argument is never given to the ss request, the behaviour of UNIX troff
is the same as that exhibited by GNU troff. In GNU troff, as in UNIX
troff, a sentence should always be followed by either a newline or two
spaces.
So you can specify that the "sentence space" should be zero-width by making the request
.ss 12 0
As far as I know, this is a groff extension; heirloom troff supports it, but older dwb derived versions may not.
Example:
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1.
This is line 2.
SET SENTENCE SPACING
.ss 12 0
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1.
This is line 2.
Results:
$ groff -T ascii spaces.tr |sed -n -e/./p
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1. This is line 2.
SET SENTENCE SPACING
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1. This is line 2.
So the following will work, but I hope there is a better option.
This is line 1. \
This is line 2.
renders as
This is line 1. This is line 2.
I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......
nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
Say I have an input stream consisting of lines separated into a certain number of fields. I would like to cut on the various fields, pipe a certain field (or fields) to a program (which is assumed to return one line for each input line) and leave the other fields as is, and paste the results back together. I can probably imagine convoluted solutions, but there ought to be a clean and natural way to do that.
As a specific example, say I have a program producing lines of the form:
$ inputprog
<a> hello world!
<b> hi everyone!
<a> hi!
Say I would like to put the message in uppercase while leaving the first field unchanged. Here is how I would imagine things:
$ inputprog | program -d' ' -f2- "tr a-z A-Z"
<a> HELLO WORLD!
<b> HI EVERYONE!
<a> HI!
I am looking for a reasonably clean way to approximate program. (I am not interested in solutions which are specific to this example.)
Thanks in advance for your help!
awk can do what you want. For example:
$ echo "field1 field2" | awk '{$2 = toupper($2); print;}'
field1 FIELD2
Comes pretty close to what you want to do. $2 = toupper($2); changes the second field, while print prints out the whole (modified) line.
However, you got a problem in how you define a 'field'. In the example above fields are separated by spaces (you can change the field separator to an arbitrary regexp with like so: -F'<[a-zA-Z]+>' - this would consider as a field separator).
But in your example you seem to view <a> as one field and hello world! as another one. Any program could only come to your desired behaviour by wild guessing that way. Why wouldn't world! be considered a third field?
So, if you can get input with a clear policy of separating fields, awk is exactly what you want.
Check out pages like http://people.cs.uu.nl/piet/docs/nawk/nawk_92.html (awk string functions) and http://www.pement.org/awk/awk1line.txt (awk 1 liners) for more information.
BTW, one could also make your specific example above work by looping over all the fields except the first one (NF == Number of Fields):
$ echo "<a> hello world!
<b> hi everyone!
<a> hi" |
awk '{for(i=2;i<=NF;++i) { $i=toupper($i); }; print;}'
<a> HELLO WORLD!
<b> HI EVERYONE!
<a> HI
Even though you are not interested in the solution to this example. ;-)
P.S.: sed should also be able to do the job (http://en.wikipedia.org/wiki/Sed)
I want to read a long text file in two-column format on my terminal. This means that the columns must be page-aware, so that text at the bottom of the first column continues at the top of the second column, but text at the bottom of the second column continues at the beginning of the first column after a page-down.
I tried column and less to get this result, but with no luck. If I pipe the text into column, it produces two columns but truncates the text before it reaches the end of the file. And if I pipe the output of column into less, it also reverts back to single-column.
a2ps does what I want in the way of reformatting, but I would rather have the output in pure plain text, readable from the terminal, rather than a PostScript file that I would need to read in a PDF reader.
You can use pr for this, eg.
ls /usr/src/linux/drivers/char/ | pr -2 |less