Delete all lines after a string but the last few - unix

I have file which contains some lines and some numbers. I want to delete all the lines after particular string, let us say the string is 'angles'. I want to delete all the lines after angles, I can do this using sed command as follows:
sed -n '/angles/q;p' input file
My question is, I want to delete all the lines after angles but want to retain the last four lines? How can I do this in sed?

The following should work for you:
tac inputfile | sed '5,/angles/d' | tac
Explanation: Reverse the file, keep first 4 lines and delete everything until the desired pattern is encounterd. Reverse the result.

This might work for you (GNU sed):
sed '1,/angles/b;:a;$!{N;s/\n/&/4;Ta;D}' file
Print out all the lines from the first till encountering the string angles. Then make a moving window of four lines and print them at end of file.

In awk :
awk '{
if(p!=1)print;
else{a[NR]=$0}
if($0~/angles/)p=1
}
END{
for(i=NR-3;i<=NR;i++)print a[i]
}' your_file
Tested below:
> cat temp
wbj
ergthet
gheheh
wergergkn angles gerg
er
ergnmer
rgnerk
1
2
3
4
> awk '{if(p!=1)print;else{a[NR]=$0};if($0~/angles/)p=1}END{for(i=NR-3;i<=NR;i++)print a[i]}' temp
wbj
ergthet
gheheh
wergergkn angles gerg
1
2
3
4
>

awk -v size=$(wc -l < file) '
/angles/ { skip=1 }
NR >= (size-4) { skip=0 }
!skip
' file

I'm going to assume the string you're looking for may occur more than once, and you only want up to the first one. If that's true, reversing the file isn't the best strategy.
{ sed '/angles/q' file; tail -4 file; } > result

Related

unix ksh how to print $1 and first n characters of $2

I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting
1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.
You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.
If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

transpose a column in unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.
tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.
With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z
awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Use sed to delete everything after '>' and add index number plus a string?

I know this should be pretty simple to do, but I can't get it to work. My file looks like this
>c12345|random info goes here that I want to delete
AAAAATTTTTTTTCCCC
>c45678| more | random info| here
GGGGGGGGGGG
And what I want to do is just make this far simpler so it might look like this
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGGG
>seq3 [organism=human]
etc....
I know I can append that constant easily once I get the indexed part in there by doing:
sed '/^>/ s/$/\[organism-human]/g'
But how do I get that index built?
With sed:
sed '/^>/d' filename | sed '=' | sed 's/^[0-9]*$/>seq& [organism=human]/'
(Thanks to NeronLeVelu for the simplification.)
Here's one way you could do it using awk:
$ awk '/^>/ { $0 = ">seq" ++i " [organism=human]" } 1' file
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGG
When the line begins with >, replace it with seq followed by i (which increases by 1 every time), then [organism=human]. The 1 at the end of the command is true, so awk performs the default action, which is to print the line.
Might be easier with a Perl one-liner:
perl -ne 'chomp; if (/^>/) { s/\|.*$//; print "$_ \[organism=human\]\n";} else { print "$_\n";}' filename

Replacing a String Pattern with another sequence in unix

I want replace the String TaskID_1 with a sequence starting from 1001 and this TaskID_1 can exists any many number of lines in my input file.
Similarly i need to replace all occurrences of TASKID_2 in my input file with next sequence value 1002.
Input file:
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
The output file should look like:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
Here's one way using awk:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR }1' file
Or less verbosely:
awk -F '|' '{ $3=1000 + NR }1' OFS='|' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1002|dksj|kdjfdsjf|12
1234|25345|1003|dksj|kdjfdsjf|12
123425|65345|1004|dksj|kdjfdsjf|12
123425|15325|1005|dksj|kdjfdsjf|12
11345|55315|1006|dksj|kdjfdsjf|12
6345|15345|1007|dksj|kdjfdsjf|12
72345|25345|1008|dksj|kdjfdsjf|12
9345|411345|1009|dksj|kdjfdsjf|12
For the first example, the file separator and output file separator are set to a single pipe character. This is set in the BEGIN block, so that it is executed only once, and not on every line of input. We then set the third column to be equal to 1000 plus an incrementing variable. We could use ++i as this variable, but we could instead use NR (which is short for record number/line number) and this would therefore avoid the need to create an extra variable. The 1 on the end enables printing by default. A more verbose solution would look like:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR; print }' file
EDIT:
Using the updated data file, try:
awk 'BEGIN { FS=OFS="|" } { sub(/.*_/,"",$3); $3+=1000 }1' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
A Perl solution using Steve's logic of adding 1000:
perl -pne 's/TaskID_(\d+)/$1+1000/e;' file
This replaces the 'TaskID_n' with 1000+n. 'e' is used to evaluate the replacement.
Replace TaskID_ with 100, this is super easy with sed for single digit IDs:
$ sed 's/TaskID_/100/' file
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
To store this change back to the file use the -i option:
sed -i 's/TaskID_/100/' file
Note: this works for TaskID_[0-9] if you want TaskID_23 mapped to 1023 then this won't, this would map TaskID_23 to 10023.
I can't come up with a better solution than the one steve suggested in awk.
So here's a worse solution, using only bash.
#!/bin/bash
IFS='|'
while read f1 f2 f3 f4 f5 f6; do
printf '%s|%s|%d|%s|%s|%s\n' "$f1" "$f2" "$((${f3#*_}+1000))" "$f4" "$f5" "$f6"
done < input
It's "worse" only because it'll be much slower than awk, which is fast and efficient with this sort of problem.
perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' your_file
Tested Below:
> cat temp
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
> perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' temp
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
>

How do I delete a matching line and the previous one?

I need delete a matching line and one previous to it.
e.g In file below I need to remove lines 1 & 2.
I tried "grep -v -B 1 "page.of." 1.txt
and I expected it to not print the matchning lines and the context.
I tried the How do I delete a matching line, the line above and the one below it, using sed? but could not understand the sed usage.
---1.txt--
**document 1** -> 1
**page 1 of 2** -> 2
testoing
testing
super crap blah
**document 1**
**page 2 of 2**
You want to do something very similar to the answer given
sed -n '
/page . of ./ { #when pattern matches
n #read the next line into the pattern space
x #exchange the pattern and hold space
d #skip the current contents of the pattern space (previous line)
}
x #for each line, exchange the pattern and hold space
1d #skip the first line
p #and print the contents of pattern space (previous line)
$ { #on the last line
x #exchange pattern and hold, pattern now contains last line read
p #and print that
}'
And as a single line
sed -n '/page . of ./{n;x;d;};x;1d;p;${x;p;}' 1.txt
grep -v -B1 doesnt work because it will skip those lines but will include them later on (due to the -B1. To check this out, try the command on:
**document 1** -> 1
**page 1 of 2** -> 2
**document 1**
**page 2 of 2**
**page 3 of 2**
You will notice that the page 2 line will be skipped because that line won't be matched and the next like wont be matched.
There's a simple awk solution:
awk '!/page.*of.*/ { if (m) print buf; buf=$0; m=1} /page.*of.*/ {m=0}' 1.txt
The awk command says the following:
If the current line has that "page ... of ", then it will signal that you haven't found a valid line. If you do not find that string, then you print the previous line (stored in buf) and reset the buffer to the current line (hence forcing it to lag by 1)
grep -vf <(grep -B1 "page.*of" file | sed '/^--$/d') file
Not too familiar with sed, but here's a perl expression to do the trick:
cat FILE | perl -e '#a = <STDIN>;
for( $i=0 ; $i <= $#a ; $i++ ) {
if($i > 0 && $a[$i] =~ /xxxx/) {
$a[$i] = "";
$a[$i-1] = "";
}
} print #a;'
edit:
where "xxxx" is what you are trying to match.
Thanks, I was trying to use the awk command given by Foo Bah
to delete the matching line and the previous one. I have to use it multiple times, so for the matching part I use a variable. The given awk command works, but when using a variable it does not work (i.e. it does not delete the matching & prev. line). I tried:
awk -vvar="page.*of.*" '!/$var/ { if (m) print buf; buf=$0; m=1} /$var/ {m=0}' 1.txt

Resources