How connect words in a text file - unix

I have a file in following format:
B: that
I: White
I: House
B: the
I: emergency
I: rooms
B: trauma
I: centers
What I need to do is to read line by line from the top, if the line begin with B then remove B:
If it begin with I: then remove I: and connect to the previous one (the previous one is processed in the same rule).
Expected Output:
that White House
the emergency rooms
trauma centers
What I tried:
while read line
do
string=$line
echo $string | grep "B:" 1>/dev/null
if [ `echo $?` -eq 0 ] //if start with " B: "
then
$newstring= echo ${var:4} //cut first 4 characters which including B: and space
echo $string | grep "I:" 1>/dev/null
if [ `echo $?` -eq 0 ] //if start with " I: "
then
$newstring= echo ${var:4} //cut first 4 characters which including I: and space
done < file.txt
What I don't know is how to put it back to the line (in the file) and how to connect the line to the previous processed one.

Using awk print the second field of I: and B: records. The variable first is used to control the newline output.
/B:/ searches for the B: pattern. This pattern marks the start of the record. If the record is NOT the first, then a newline is printed, then the data $2 is printed.
If the pattern found is I: the data $2 (the second field which follows I: is printed.
awk 'BEGIN{first=1}
/B:/ { if (first) first=0; else print ""; printf("%s ", $2); }
/I:/ { printf("%s ", $2) }
END {print ""}' filename

awk -F":" '{a[NR]=$0}
/^ B:/{print line;line=$2}
/^ I:/{line=line" "$2}
END{
if(a[NR]!~/^B/)
{print line}
}' Your_file

awk '/^B/ {printf "\n%s",$2} /^I/ {printf " %s",$2}' file
that White House
the emergency rooms
trauma centers
Shorten it some
awk '/./ {printf /^B/?"\n%s":" %s",$2}' file

There is an interesting solution using awk auto-split on RS patterns. Note that this is a bit sensitive to variations in the input format:
<infile awk 1 RS='(^|\n)B: ' | awk 1 RS='\n+I: ' ORS=' ' | grep -v '^ *$'
Output:
that White House
the emergency rooms
trauma centers
This works at least with GNU awk and Mikes awk.

This might work for you (GNU sed):
sed -r ':a;$!N;s/\n$//;s/\n\s*I://;ta;s/B://g;s/^\s*//;P;D' file
or:
sed -e ':a' -e '$!N' -e 's/\n$//' -e 's/\n\s*I://' -e 'ta' -e 's/B://g' -e 's/^\s*//' -e 'P' -e 'D' file

Related

cut command --complement flag equivalent in AWK

I am new to writing shell scripts
I am trying to write an AWK command which does exactly the below
cut --complement -c $IGNORE_RANGE file.txt > tmp
$IGNORE_RANGE can be of any value say, 1-5 or 5-10 etc
i cannot use cut since i am in AIX and AIX does not support --complement, is there any way to achieve this using AWK command
Example:
file.txt
abcdef
123456
Output
cut --complement -c 1-2 file.txt > tmp
cdef
3456
cut --complement -c 4-5 file.txt > tmp
abcf
1236
cut --complement -c 1-5 file.txt > tmp
f
6
Could you please try following, written and tested with shown samples. We have range variable of awk which should be in start_of_position-end_of_position and we could pass it as per need.
awk -v range="4-5" '
BEGIN{
split(range,array,"-")
}
{
print substr($0,1,array[1]-1) substr($0,array[2]+1)
}
' Input_file
OR to make it more clear in understanding wise try following:
awk -v range="4-5" '
BEGIN{
split(range,array,"-")
start=array[1]
end=array[2]
}
{
print substr($0,1,start-1) substr($0,end+1)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v range="4-5" ' ##Starting awk program from here creating range variable which has range value of positions which we do not want to print in lines.
BEGIN{ ##Starting BEGIN section of this program from here.
split(range,array,"-") ##Splitting range variable into array with delimiter of - here.
start=array[1] ##Assigning 1st element of array to start variable here.
end=array[2] ##Assigning 2nd element of array to end variable here.
}
{
print substr($0,1,start-1) substr($0,end+1) ##Printing sub-string of current line from 1 to till value of start-1 and then printing from end+1 which basically means will skip that range of characters which OP does not want to print.
}
' Input_file ##Mentioning Input_file name here.
You can do this in awk:
awk -v st=1 -v en=2 '{print substr($0, 1, st-1) substr($0, en+1)}' file
cdef
3456
Or:
awk -v st=4 -v en=5 '{print substr($0, 1, st-1) substr($0, en+1)}' file
abcf
1236

Merge a string to a line extracted from a text file in UNIX

I wanted to merge a string ABC to a line that I have extracted from a file.
The following command is used to extract the lines 20-25 in file_ABC, take only the first column, which is then transposed to become a row (or line).
sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s
This is the result:
2727778 14734 0 0 0 2713044
I would like to add at the first position of this line the string ABC.
ABC 2727778 14734 0 0 0 2713044
Any suggestion on how to do that?
A quick hack would be to use something like
printf 'ABC\t%s\n' "$(sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s)"
You could modify your initial command instead to use awk for everything, though:
awk '
BEGIN {printf "ABC"}
NR>=20 && NR<=25 {printf "\t%s", $1}
END {print ""}
' file_ABC
This might work for you (GNU sed):
sed '20,25{s/\s.*//;H};$!d;x;s/^/ABC/;s/\n/ /g' file
Gather up the first column fields by appending them to the hold space for rows 20 to 25 only. At the end of the file prepend ABC and replace the introduced newlines by spaces.
For fun, bash only
filename=file_ABC
words=("${filename##*_}")
i=0
while read -r word rest_of_line; do
((++i < 20 )) && continue
(( i > 25 )) && break
words+=("$word")
done < "$filename"
join() { local IFS=$1; shift; echo "$*"; }
join $'\t' "${words[#]}"
But this will be much slower than a single awk call.
if you want to keep all in one script
$ awk 'BEGIN {line="ABC"}
NR>=20 && NR<=25 {line=line FS $1}
NR==25 {print line; exit}' file
improved version as suggested by #EdMorton
$awk 'NR>=20 {line=line OFS $1}
NR==25 {print "ABC" line; exit}' file

dealing with % symbol in a file using awk command

I am using this command
awk '{printf $1; for (i=2;i<=10;i++) {printf OFS $i} printf "\n"}' FS='|' OFS='|' file.txt >>new_file.txt
its working fine if the record does not have any % symbol in a file.
My Requirement :
Input:
A|B|C|D|E
A|B
Output:
A|B|C|D|E|||||
A|B||||||||
Sample value which is giving error - '20% OFF ONLINE PRICE MATCH'
how do I handle this issue?
Error - awk: There are not enough parameters in printf statement |20% OFF ONLINE PRICE MATCH.
The first argument to printf is actually the format string, which gets printed as-is if there are no formatting flags (%x) and it's the only argument. Unless you are in control of the string, you should always provide two arguments to printf, exactly for this reason, that is, to guard against formatting flags (expected or otherwise) occurring in the supplied strings. In your case, change the printf statements to
printf "%s", $1
and
printf "%s", OFS $i
and you should be fine.
By way of illustration:
$ echo '20% OFF ONLINE PRICE MATCH' | awk -F\| '{ printf $1 }'
awk: weird printf conversion % O
input record number 1, file
source line number 1
awk: not enough args in printf(20% OFF ONLINE PRICE MATCH)
input record number 1, file
source line number 1
$ echo '20% OFF ONLINE PRICE MATCH' | awk -F\| '{ printf "%s\n", $1 }'
20% OFF ONLINE PRICE MATCH

Unix find line number of a string in a file using awk/grip

I'm trying to find a position of a string
awk -F : '{if ( $0 ~ /Red Car/) print $0}' /var/lab/lab2/rusiuot/stud2001 | tail -l
and somehow I need to find a line position of Red Car. It is possible to do that using awk or grep?
You can do
awk '/Red Car/ {print NR}' /var/lab/lab2/rusiuot/stud2001
This will print the line number for the line with Red Car
If you like the line number to be printed at end of the file:
awk '/Red Car/ {a[NR]} 1; END {print "\nlines with pattern";for (i in a) printf "%s ",i;print ""}' file
Try something like:
grep -n "Red Car" /var/lab/lab2/rusiuot/stud2001 | cut -d":" -f 1
-n option will display the line number along with line where pattern is found.

How to use awk to do file copy. Copy using split in awk not working

I am missing something subtle. I tried running below command but it didn't work. Can you please help .
ls | awk '{ split($1,a,".gz")} {cp " " $1 " " a[1]".gz"}'
Although when i am trying to print it is showing copy command.
ls | awk '{ split($1,a,".gz")} {print "cp" " " $1 " " a[1]".gz"}'
Not sure where the problem is. Any pointers will be helpful
To summarize some of the comments and point out what's wrong with the first example:
ls | awk '{ split($1,a,".gz")} {cp " " $1 " " a[1]".gz"}'
^ unassigned variable
The cp defaults to "" and is not treated as the program cp. If you do the following in a directory with one file, test.gz_monkey, you'll see why:
ls | awk '{split($1,a,".gz"); cmd=cp " " $1 " " a[1] ".gz"; print ">>" cmd "<<" }'
results in
>> test.gz_monkey test.gz<<
^ the space here is because cp was "" when cmd was assigned
Notice that you can separate statements with a ; instead of having two action blocks. Awk does support running commands in a subshell - one of which is system, another is getline. With the following changes, your concept can work:
ls | awk '{split($1,a,".gz"); cmd="cp "$1" "a[1]".gz"; system(cmd) }'
^ notice cp has moved inside a string
Another thing to notice - ls isn't a good choice for only finding files in the current directory. Instead, try find:
find . -type f -name "*.gz_*" | awk '{split($1,a,".gz"); cmd="cp "$1" "a[1]".gz"; system(cmd) }'
while personally, I think something like the following is more readable:
find . -type f -name "*.gz_*" | awk '{split($1,a,".gz"); system(sprintf( "cp %s %s.gz", $1, a[1])) }'
Why are you using awk at all? Try:
for f in *; do cp "$f" "${f%.gz*}.gz"; done

Resources