How to sort characters in a string? - unix

I would like to sort the characters in a string.
E.g.
echo cba | sort-command
abc
Is there a command that will allow me to do this or will I have to write an awk script to iterate over the string and sort it?

echo cba | grep -o . | sort |tr -d "\n"

Please find the following useful methods:
Shell
Sort string based on its characters:
echo cba | grep -o . | sort | tr -d "\n"
String separated by spaces:
echo 'dd aa cc bb' | tr " " "\n" | sort | tr "\n" " "
Perl
print (join "", sort split //,$_)
Ruby
ruby -e 'puts "dd aa cc bb".split(/\s+/).sort'
Bash
With bash you have to enumerate each character from a string, in general something like:
str="dd aa cc bb";
for (( i = 0; i < ${#str[#]}; i++ )); do echo "${str[$i]}"; done
For sorting array, please check: How to sort an array in bash?

This is cheating (because it uses Perl), but works. :-P
echo cba | perl -pe 'chomp; $_ = join "", sort split //'

Another perl one-liner
$ echo cba | perl -F -lane 'print sort #F'
abc
$ # for reverse order
$ echo xyz | perl -F -lane 'print reverse sort #F'
zyx
$ # or
$ echo xyz | perl -F -lane 'print sort {$b cmp $a} #F'
zyx
This will add newline to output as well, courtesy -l option
See Command switches for doc on all the options
The input is basically split character wise and saved in #F array
Then sorted #F is printed
This will also work line wise for given input file
$ cat ip.txt
idea
cold
spare
umbrella
$ perl -F -lane 'print sort #F' ip.txt
adei
cdlo
aeprs
abellmru

This would have been more appropriate as a comment to one of the grep -o . solutions (my reputation's not quite up to that low bar alas, damn my lurking), but I thought it worth mentioning that separating letters can be done more efficiently within the shell. It's always worth avoiding code, but this letsep function is pretty small:
letsep ()
{
INWORD="$1"
while [ "$INWORD" ]
do
echo ${INWORD:0:1}
INWORD=${INWORD#?}
done
}
. . . and outputs one letter per line for an input string of arbitrary length. For example, once letsep is defined, populating an array FLETRS with the letters of a string contained in variable FRED could be done (assuming contemporary bash) as:
readarray -t FLETRS < <(letsep $FRED)
. . . which for word-size strings runs about twice as fast as the equivalent :
readarray -t FLETRS < <(echo $FRED | grep -o .)
Whether this is worth setting up depends on the application. I've only measured this crudely, but the slower procedural code seems to maintain an advantage over the context switch up to ~60 chars (grep is obviously more efficient, but loading it is relatively expensive). If the above operation is taking place in one or more steps of a loop over an indeterminate number of executions, the difference in efficiency can add up (at which point some might argue for switching tools and rewriting regardless, but that's another set of tradeoffs).

Related

tcsh passing a variable inside a shell script

I've defined a variable inside a shell script and I want to use it. For some reason, I cannot pass it into to command line that I need it in.
Here's my script which fails at the last lines
#! /usr//bin/tcsh -f
if ( $# != 2 ) then
echo "Usage: jump_sorter.sh <jump> <field to sort on>"
exit;
endif
set a = `cat $1 | tail -1` #prepares last row for check with loop
set b = $2 #this is the value last row will be checked for
set counter = 0
foreach i ($a)
if ($i == "$b") then
set bingo = $counter
echo "$bingo is the field to print from $a"
endif
set counter = `expr $counter + 1`
end
echo $bingo #this prints the correct value for using in the command below
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr #but this doesn't work.
#when I use $9 instead of $bingo, it does work.
How can I pass $bingo into the final line correctly, please?
Update: following the accepted answer from Martin Tournoij, the correct way to handle the "$" sign in the command is:
cat $1 | awk "{print("\$"$bingo)}" | sort | uniq -c | sort -nr
The reason it doesn't work is because variables are only substituted inside double quotes ("), not single quotes ('), and you're using single quotes:
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr
The following should work:
cat $1 | awk "{print($bingo)}" | sort | uniq -c | sort -nr
You also have an error here:
#! /usr//bin/tcsh -f
That should be:
#!/usr/bin/tcsh -f
Note that csh isn't usually recommended for scripting; it has many quirks and lacks some features like functions. Unless you really need to use csh, it's recommended to use a Bourne shell (/bin/sh, bash, zsh) or a scripting language (Python, Ruby, etc.) instead.

Counting the number of 7 character words in a file that start with tree and do not end in u or v

I'm trying to count the number of 7 character words in a file that start with tree and do not end in u or v. I know how to specify the begin with tree and end in u or v condition in cat, but I'm not sure how to identify exactly 7 words or enter the conditions using wc. My pathname is /users/file1.txt.
This is the valid cat command(missing number of 7 character words)
cat /users/file1.txt | grep ^tree.*[!uv]
Below is the invalid wc command(missing number of 7 character words)
wc - w /users/file1.txt | grep ^tree.*[!uv]
Do you like perl? Here a one-liner:
cat /users/file1.txt | perl -lne 'if (/^(tree)(.{4}$)(?<![uv])/) { print $_ }'
sed -e 's/%//g' -e 's/\btree..[^uv]\b/%/g' -e 's/[^%]//g' -e 's/%/word /g' /users/file1.txt | wc -w
Don't let anyone steal our token.
Give us a token for what we want to count; match word boundaries to count to 7, negate match character in (u,v).
Get rid of everything else.
Turn our token into a friendly word plus a space.
Count 'em.
Reut's answer is very close.
But this will get you where you need:
cat /users/file1.txt | grep -wo 'tree..[^uv]' | wc -l
-w will get exact word matches
see that I ditched the .* and specified .. instead, as the total number of characters matched is 7
I also got rid of the ^tree so you can also match words that aren't at the beginning of the line.
Using grep and wc:
# echo the file # filter files # grep EXACT words # count
cat /users/file1.txt | grep ^tree.*[^u^v] | grep -o '[^\ ]\{7\}' | wc -w
Pipe walkthrough:
Echo content of the source file:
cat /users/file1.txt
Pass only files starting with "tree" and not ending with either "u" or "v":
grep ^tree.*[^u^v]
Forward any word that is composed of 7 non-spaces (if you want only letters use [a-zA-Z] instead of [^\ ]):
grep -o '[^\ ]\{7\}'
Count the words that made it here:
wc -w
Here is one other way using pretty basic bash:
count=0
for word in $(cat f.py)
do
if [ 7 -eq ${#word} ]
then
count=$((count+1))
fi
done
echo $count
Or in a single line:
count=0; for word in $(cat f.py); do if [ 7 -eq ${#word} ]; then count=$((count+1)); fi; done; echo $count
You may want to remove dots and commas from word.

Get last field using awk substr

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.

counting records in unix file

This was an interview question, nevertheless still a programming question.
I have a unix file with two columns name and score. I need to display count of all the scores.
like
jhon 100
dan 200
rob 100
mike 100
the output should be
100 3
200 1
You only need to use built in unix utility to solve it, so i am assuming using shell scripts . or reg ex. or unix commands
I understand looping would be one way to do. store all the values u have already seen and then grep every record for unseen values. any other efficient way of doing it
Try this:
cut -d ' ' -f 2 < /tmp/foo | sort -n | uniq -c \
| (while read n v ; do printf "%s %s\n" "$v" "$n" ; done)
The initial cut could be replaced with another while read loop, which would be more resilient to input file format variations (extra whitespace). If some of the names consist in several words, simple field extraction will not work as easily, but sed can do it.
Otherwise, use your favorite programming language. Perl would probably shine. It is not difficult either in Java or even in C or Forth.
$ cat foo.txt
jhon 100
dan 200
rob 100
mike 100
$ awk '{print $2}' foo.txt | sort | uniq -c
3 100
1 200
Its a pity you can't do a count with sort or uniq alone.
Edit: I just noticed I have the count in front ... to get it exactly the same you can do:
$ awk '{print $2}' foo.txt | sort | uniq -c | awk '{ print $2 " " $1 }'
Not very complicated in perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my %count = ();
while (<>) {
chomp;
my ($name, $score) = split(/ /);
$count{$score}++;
}
foreach my $key (sort keys %count) {
print "$key ", $count{$key}, "\n";
}
You could go with awk:
awk '/.*/ { a[$2] = a[$2] + 1; } END { for (x in a) { print x, " ", a[x] } }' record_file.txt
Alternatively with shell commands:
for i in `awk '{print $2}' inputfile | sort -u`
do
echo -n "$i "
grep $i inputfile | wc -l
done
The first awk command will give a list of all the different scores (e.g. 100 and 200) which then
the for loop iterates over, counting up each separately. Not very super efficient, but simple. If the file is not to big is should not be a too big problem.

Resources