Reversing order of columns in file - unix

Im working with database which looks very common:
age:position:name:
I still don't know why this database is made in this order, but for better orientation and manipulation, I would like to reverse the order like this:
name:age:position.
I'm trying to do it with unix like this:
datab=`cut -d : -f1,2,3 inf.major`
age=`echo "$datab" | cut -d : -f1 > age`
pos=`echo "$datab" | cut -d : -f2 > pos`
name=`echo "$datab" | cut -d : -f3 > name`
paste -d : "age" "pos" "name" > inf.major
This is quite laboriously. It would be ok if the data had only few "sections" divided by : but it has more than 10. Is there any way how to achieve the same result but dynamicly/faster?

You can use awk/gawk:
gawk -F":" '{print $3":"$2":"$1;}' inf.major
This will separate each line of your file at : and print the first three elements in reversed order.

Related

How can I remove specific characters in certain lines in a file?

How do I cut characters from column 5 to 7 of the lines 3 onwards?
I am trying to use sed/cut.
For example, If I have
this is amazing1 this is amazing11
this is amazing2 this is amazing21
this is amazing3 this is amazing31
this is amazing4 this is amazing41
this is amazing5 this is amazing51
this is amazing6 this is amazing61
this is amazing7 this is amazing71
Output should look like:
this is amazing1 this is amazing11
this is amazing2 this is amazing21
this amazing3 this is amazing31
this amazing4 this is amazing41
this amazing5 this is amazing51
this amazing6 this is amazing61
this amazing7 this is amazing71
The characters is are removed from lines 3 and onwards.
sed -E '3,$s/(....).../\1/' file
I'd just use awk for clarity, portability, etc.:
$ awk 'NR>2{$0=substr($0,1,4) substr($0,8)} 1' file
this is amazing1 this is amazing11
this is amazing2 this is amazing21
this amazing3 this is amazing31
this amazing4 this is amazing41
this amazing5 this is amazing51
this amazing6 this is amazing61
this amazing7 this is amazing71
or using variables populated with the values from your question:
$ awk -v n=3 -v beg=5 -v end=7 'NR>=n{$0=substr($0,1,beg-1) substr($0,end+1)} 1' file
this is amazing1 this is amazing11
this is amazing2 this is amazing21
this amazing3 this is amazing31
this amazing4 this is amazing41
this amazing5 this is amazing51
this amazing6 this is amazing61
this amazing7 this is amazing71
In two steps:
head -n2 infile; tail -n+3 infile | cut --complement -c5-7
The first command prints the first two lines unmodified; the second command pipes the lines starting with the third one to cut, where character 5 to 7 are removed (requires GNU cut).
If you need to do something with the output, like store it in a file, you have to group these commands before redirecting:
{
head -n2 infile
tail -n+3 infile | cut --complement -c5-7
} > outfile
If you want to use sed:
sed '1,2!s/^\(\w*\)\s*\w*\(.*\)$/\1\2/' file
DETAILS
1,2!s - Don't do substitutions on line 1 and 2.
/^\(\w*\)\s*\w*\(.*\)$/ - The matching pattern.
/\1\2/ - Restore the groups of 1 and 2.
file - Your input file.

Split and concatenating in unix

UNIX:
I have to load file contents to one table when it will find the tag:
ACC2020000
Contents in file:
ACC2020000 ALEJA B JURI
Tried with below code :
if(substr($_,0,10) eq 'ACC2020000')
{
$ADDRESS1= (split(" ",$_))[1];
}
Output : ALEJA
Expected Output : ALEJA B JURI
Can anyone suggest how to get the correct output?
You can do this with grep and cut easily, assuming there is a space after the ACC2020000 pattern:
grep '^ACC2020000' file | cut -f2- -d' '

Get last field using awk substr

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'

How to change the field sequence in cut command in unix

I want to print the fields in specific format ,
Input :
col1|col2|col3|col4
I used cat file | cut -d '|' -f 3,1,4
output :
col1|col3|col4
But my expected output is:
col3|col1|col4
Can anyone help me with this?
From man cut:
Selected input is written in the same order that it is read, and is written exactly once
You should do:
$ awk -F'|' -vOFS='|' '{print $3,$1,$4}' <<< "col1|col2|col3|col4"
col3|col1|col4
even though awk is good,here is a perl solution:
perl -F"\|" -ane 'print join "|",#F[2,0,3]'
tested:
> echo "col1|col2|col3|col4" | perl -F"\|" -ane 'print join "|",#F[2,0,3]'
col3|col1|col4

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.

Resources