Merge a string to a line extracted from a text file in UNIX - unix

I wanted to merge a string ABC to a line that I have extracted from a file.
The following command is used to extract the lines 20-25 in file_ABC, take only the first column, which is then transposed to become a row (or line).
sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s
This is the result:
2727778 14734 0 0 0 2713044
I would like to add at the first position of this line the string ABC.
ABC 2727778 14734 0 0 0 2713044
Any suggestion on how to do that?

A quick hack would be to use something like
printf 'ABC\t%s\n' "$(sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s)"
You could modify your initial command instead to use awk for everything, though:
awk '
BEGIN {printf "ABC"}
NR>=20 && NR<=25 {printf "\t%s", $1}
END {print ""}
' file_ABC

This might work for you (GNU sed):
sed '20,25{s/\s.*//;H};$!d;x;s/^/ABC/;s/\n/ /g' file
Gather up the first column fields by appending them to the hold space for rows 20 to 25 only. At the end of the file prepend ABC and replace the introduced newlines by spaces.

For fun, bash only
filename=file_ABC
words=("${filename##*_}")
i=0
while read -r word rest_of_line; do
((++i < 20 )) && continue
(( i > 25 )) && break
words+=("$word")
done < "$filename"
join() { local IFS=$1; shift; echo "$*"; }
join $'\t' "${words[#]}"
But this will be much slower than a single awk call.

if you want to keep all in one script
$ awk 'BEGIN {line="ABC"}
NR>=20 && NR<=25 {line=line FS $1}
NR==25 {print line; exit}' file
improved version as suggested by #EdMorton
$awk 'NR>=20 {line=line OFS $1}
NR==25 {print "ABC" line; exit}' file

Related

cut command --complement flag equivalent in AWK

I am new to writing shell scripts
I am trying to write an AWK command which does exactly the below
cut --complement -c $IGNORE_RANGE file.txt > tmp
$IGNORE_RANGE can be of any value say, 1-5 or 5-10 etc
i cannot use cut since i am in AIX and AIX does not support --complement, is there any way to achieve this using AWK command
Example:
file.txt
abcdef
123456
Output
cut --complement -c 1-2 file.txt > tmp
cdef
3456
cut --complement -c 4-5 file.txt > tmp
abcf
1236
cut --complement -c 1-5 file.txt > tmp
f
6
Could you please try following, written and tested with shown samples. We have range variable of awk which should be in start_of_position-end_of_position and we could pass it as per need.
awk -v range="4-5" '
BEGIN{
split(range,array,"-")
}
{
print substr($0,1,array[1]-1) substr($0,array[2]+1)
}
' Input_file
OR to make it more clear in understanding wise try following:
awk -v range="4-5" '
BEGIN{
split(range,array,"-")
start=array[1]
end=array[2]
}
{
print substr($0,1,start-1) substr($0,end+1)
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v range="4-5" ' ##Starting awk program from here creating range variable which has range value of positions which we do not want to print in lines.
BEGIN{ ##Starting BEGIN section of this program from here.
split(range,array,"-") ##Splitting range variable into array with delimiter of - here.
start=array[1] ##Assigning 1st element of array to start variable here.
end=array[2] ##Assigning 2nd element of array to end variable here.
}
{
print substr($0,1,start-1) substr($0,end+1) ##Printing sub-string of current line from 1 to till value of start-1 and then printing from end+1 which basically means will skip that range of characters which OP does not want to print.
}
' Input_file ##Mentioning Input_file name here.
You can do this in awk:
awk -v st=1 -v en=2 '{print substr($0, 1, st-1) substr($0, en+1)}' file
cdef
3456
Or:
awk -v st=4 -v en=5 '{print substr($0, 1, st-1) substr($0, en+1)}' file
abcf
1236

Extract file string from left side but following 2nd delimiter from right

Below are the full file names.
qwertyuiop.abcdefgh.1234567890.txt
qwertyuiop.1234567890.txt
trying to use
awk -F'.' '{print $1}'
How can i use awk command to extract below output.
qwertyuiop.abcdefgh
qwertyuiop
Edit
i have a list of files in a directory
i am trying to extract time,size,owner,filename into seperate variables.
for filenames.
NAME=$(ls -lrt /tmp/qwertyuiop.1234567890.txt | awk -F'/' '{print $3}' | awk -F'.' '{print $1}')
$ echo $NAME
qwertyuiop
$
NAME=$(ls -lrt /tmp/qwertyuiop.abcdefgh.1234567890.txt | awk -F'/' '{print $3}' | awk -F'.' '{print $1}')
$ echo $NAME
qwertyuiop
$
expected
qwertyuiop.abcdefgh
With GNU awk and other versions that allow manipulation of NF
$ awk -F. -v OFS=. '{NF-=2} 1' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
NF-=2 will effectively delete last two fields
1 is an awk idiom to print contents of $0
Note that this assumes there are at least two fields in every line, otherwise you'd get an error
Similar concept with perl, prints empty line if number of fields in the line is less than 3
$ perl -F'\.' -lane 'print join ".", #F[0..$#F-2]' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
With sed, you can preserve lines if number of fields is less than 3
$ sed 's/\.[^.]*\.[^.]*$//' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
EDIT: Taking inspiration from Sundeep sir's solution and adding this following too in this mix.
awk 'BEGIN{FS=OFS="."} {$(NF-1)=$NF="";sub(/\.+$/,"")} 1' Input_file
Could you please try following.
awk -F'.' '{for(i=(NF-1);i<=NF;i++){$i=""};sub(/\.+$/,"")} 1' OFS="." Input_file
OR
awk 'BEGIN{FS=OFS="."} {for(i=(NF-1);i<=NF;i++){$i=""};sub(/\.+$/,"")} 1' Input_file
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Mentioning BEGIN section of awk program here.
FS=OFS="." ##Setting FS and OFS variables for awk to DOT here as per OPs sample Input_file.
} ##Closing BEGIN section here.
{
for(i=(NF-1);i<=NF;i++){ ##Starting for loop from i value from (NF-1) to NF for all lines.
$i="" ##Setting value if respective field to NULL.
} ##Closing for loop block here.
sub(/\.+$/,"") ##Substituting all DOTs till end of line with NULL in current line.
}
1 ##Mentioning 1 here to print edited/non-edited current line here.
' Input_file ##Mentioning Input_file name here.

AWK to print field $2 first, then field $1

Here is the input(sample):
name1#gmail.com|com.emailclient.account
name2#msn.com|com.socialsite.auth.account
I'm trying to achieve this:
Emailclient name1#gmail.com
Socialsite name2#msn.com
If I use AWK like this:
cat foo | awk 'BEGIN{FS="|"} {print $2 " " $1}'
it messes up the output by overlaying field 1 on the top of field 2.
Any tips/suggestions? Thank you.
A couple of general tips (besides the DOS line ending issue):
cat is for concatenating files, it's not the only tool that can read files! If a command doesn't read files then use redirection like command < file.
You can set the field separator with the -F option so instead of:
cat foo | awk 'BEGIN{FS="|"} {print $2 " " $1}'
Try:
awk -F'|' '{print $2" "$1}' foo
This will output:
com.emailclient.account name1#gmail.com
com.socialsite.auth.accoun name2#msn.com
To get the desired output you could do a variety of things. I'd probably split() the second field:
awk -F'|' '{split($2,a,".");print a[2]" "$1}' file
emailclient name1#gmail.com
socialsite name2#msn.com
Finally to get the first character converted to uppercase is a bit of a pain in awk as you don't have a nice built in ucfirst() function:
awk -F'|' '{split($2,a,".");print toupper(substr(a[2],1,1)) substr(a[2],2),$1}' file
Emailclient name1#gmail.com
Socialsite name2#msn.com
If you want something more concise (although you give up a sub-process) you could do:
awk -F'|' '{split($2,a,".");print a[2]" "$1}' file | sed 's/^./\U&/'
Emailclient name1#gmail.com
Socialsite name2#msn.com
Use a dot or a pipe as the field separator:
awk -v FS='[.|]' '{
printf "%s%s %s.%s\n", toupper(substr($4,1,1)), substr($4,2), $1, $2
}' << END
name1#gmail.com|com.emailclient.account
name2#msn.com|com.socialsite.auth.account
END
gives:
Emailclient name1#gmail.com
Socialsite name2#msn.com
Maybe your file contains CRLF terminator. Every lines followed by \r\n.
awk recognizes the $2 actually $2\r. The \r means goto the start of the line.
{print $2\r$1} will print $2 first, then return to the head, then print $1. So the field 2 is overlaid by the field 1.
The awk is ok. I'm guessing the file is from a windows system and has a CR (^m ascii 0x0d) on the end of the line.
This will cause the cursor to go to the start of the line after $2.
Use dos2unix or vi with :se ff=unix to get rid of the CRs.

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.

counting records in unix file

This was an interview question, nevertheless still a programming question.
I have a unix file with two columns name and score. I need to display count of all the scores.
like
jhon 100
dan 200
rob 100
mike 100
the output should be
100 3
200 1
You only need to use built in unix utility to solve it, so i am assuming using shell scripts . or reg ex. or unix commands
I understand looping would be one way to do. store all the values u have already seen and then grep every record for unseen values. any other efficient way of doing it
Try this:
cut -d ' ' -f 2 < /tmp/foo | sort -n | uniq -c \
| (while read n v ; do printf "%s %s\n" "$v" "$n" ; done)
The initial cut could be replaced with another while read loop, which would be more resilient to input file format variations (extra whitespace). If some of the names consist in several words, simple field extraction will not work as easily, but sed can do it.
Otherwise, use your favorite programming language. Perl would probably shine. It is not difficult either in Java or even in C or Forth.
$ cat foo.txt
jhon 100
dan 200
rob 100
mike 100
$ awk '{print $2}' foo.txt | sort | uniq -c
3 100
1 200
Its a pity you can't do a count with sort or uniq alone.
Edit: I just noticed I have the count in front ... to get it exactly the same you can do:
$ awk '{print $2}' foo.txt | sort | uniq -c | awk '{ print $2 " " $1 }'
Not very complicated in perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my %count = ();
while (<>) {
chomp;
my ($name, $score) = split(/ /);
$count{$score}++;
}
foreach my $key (sort keys %count) {
print "$key ", $count{$key}, "\n";
}
You could go with awk:
awk '/.*/ { a[$2] = a[$2] + 1; } END { for (x in a) { print x, " ", a[x] } }' record_file.txt
Alternatively with shell commands:
for i in `awk '{print $2}' inputfile | sort -u`
do
echo -n "$i "
grep $i inputfile | wc -l
done
The first awk command will give a list of all the different scores (e.g. 100 and 200) which then
the for loop iterates over, counting up each separately. Not very super efficient, but simple. If the file is not to big is should not be a too big problem.

Resources