Sort output from ls command is not alphabetically - ls

ls sort alphabetically
char - ascii is 45
char 9 ascii is 57
so file - sort before 9
but why file 999-a sort after 9999a
$ ls -1
-
9
9999a
999-a

Related

Awk command to print from 3rd column to till nth column

How to print from 3rd column to till last columns using awk command in unix, if there are 'n' columns in a file. I am getting with cut command but I need awk command. I am trying to do with awk -F " " '{ for{i=3;i<=NF;i++) print $i}', I am getting the output but it is not in the correct format. Can anyone suggest me the proper command.
Combining Ed Morton's answers in:
Print all but the first three columns
delete a column with awk or sed
We get something like this:
awk '{sub(/^(\S+\s*){2}/,""); sub(/(\s*\S+){2}$/,"")}1'
# ^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
# remove 2 first cols remove 2 last cols
Which you can adapt to your exact needs in terms of columns.
See an example given this input:
$ paste -d ' ' <(seq 5) <(seq 2 6) <(seq 3 7) <(seq 4 8) <(seq 5 9)
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
Let's just print the 3rd column:
$ awk '{sub(/^(\S+\s*){2}/,""); sub(/(\s*\S+){2}$/,"")}1' <(paste -d ' ' <(seq 5) <(seq 2 6) <(seq 3 7) <(seq 4 8) <(seq 5 9))
3
4
5
6
7
Your attempt was close but appears that it would print each and every column on a new line.
To correct this we create a variable called 'line' and initialize it to an empty string. The first time we are in the loop we just add the column to 'line'. From that point on we will append to 'line' with the field separator and the next column. Finally, we print 'line'. This will happen for each line in the file.
awk '{line="";for(i=3;i<=NF;i++) if(i==3) line=$i; else line=line FS $i; print line}'
In this example I assume to use awk's default field separator. Also any lines that are less than three will print blank lines.
Assuming your fields are space-separated then with GNU awk for gensub():
$ cat file
a b c d e f
g h i j k l
$ awk '{print gensub(/(\S\s){2}/,"",1)}' file
c d e f
i j k l
In general to print from, say, field 3 to field 5 if they are blank separated using GNU awk again with gensub():
$ awk '{print gensub(/(\S\s){2}((\S\s){2}\S).*/,"\\2",1)}' file
c d e
i j k
or the 3rd arg to match():
$ awk 'match($0,/(\S\s){2}((\S\s){2}\S)/,a){print a[2]}' file
c d e
i j k
or in general if they are separated by any single character:
$ awk '{print gensub("([^"FS"]"FS"){2}(([^"FS"]"FS"){2}[^"FS"]).*","\\2",1)}' file
c d e
i j k
$ awk 'match($0,"([^"FS"]"FS"){2}(([^"FS"]"FS"){2}[^"FS"])",a){print a[2]}' file
c d e
i j k
If the fields are separated by a string instead of a single-character but the RS is a single character then you should temporarily change FS to RS (which by definition you KNOW can't be present in the record) so you can negate it in the bracket expressions:
$ cat file
aSOMESTRINGbSOMESTRINGcSOMESTRINGdSOMESTRINGeSOMESTRINGf
gSOMESTRINGhSOMESTRINGiSOMESTRINGjSOMESTRINGkSOMESTRINGl
$ awk -F'SOMESTRING' '{gsub(FS,RS)} match($0,"([^"RS"]"RS"){2}(([^"RS"]"RS"){2}[^"RS"])",a){gsub(RS,FS,a[2]); print a[2]}' file
cSOMESTRINGdSOMESTRINGe
iSOMESTRINGjSOMESTRINGk
If both the FS and the RS are multi-char then there's various options but the simplest is to use the NUL character or some other character you know can't appear in your input file instead of RS as the temporary replacement FS:
$ awk -F'SOMESTRING' '{gsub(FS,"\0")} match($0,/([^\0]\0){2}(([^\0]\0){2}[^\0])/,a){gsub("\0",FS,a[2]); print a[2]}' file
cSOMESTRINGdSOMESTRINGe
iSOMESTRINGjSOMESTRINGk
Obviously change FS to OFS in the final gsub()s above if desired.
If the FS was a regexp instead of a string and you want to retain it in the output then you need to look at GNU awk for the 4th arg for split().
If you don't mind normalizing the space, the most straightforward way is
$ awk '{$1=$2=""}1' | sed -r 's/^ +//'
in action
$ seq 11 40 | pr -6ts' ' | awk '{$1=$2=""}1' | sed -r 's/^ +//'
21 26 31 36
22 27 32 37
23 28 33 38
24 29 34 39
25 30 35 40
for the input
$ seq 11 40 | pr -6ts' '
11 16 21 26 31 36
12 17 22 27 32 37
13 18 23 28 33 38
14 19 24 29 34 39
15 20 25 30 35 40
To print from third column to till end then
cat filename|awk '{for(i=1;i<3;i++) $i="";print $0}'

How do I use a grep command to display lines that only have a space character?

I have a file called test01 it currently contains:
1 Line one.$
2 This is the second line. $
3 The third $
4 $
5 This is really line 4, with one blank line before. $
6 $
7 $
8 $
9 Five$
10 $
11 Line 6 is this.
12 Seven $
13 $
14 Eighth real line. $
15 $
16 $
17 Line 9 $
18 Line 10 is the last$
19 $
20 $
I need to write a grep command that will only output lines that contain a space character. It shouldn't output lines such as 4 or 6. The desired output should be lines 8, 10 and 20. I've tried grep -vn '[a-z,A-Z,0-9]' test01 however I get the lines that do not contain characters.
Use pattern ^ +$:
grep -E '^ +$' filename.txt
Use -n if you want to get line number:
$ egrep -E -n '^ +$' filename.txt
8:
10:
20:
grep -n "^ +$" test01
The ^ means "line starts with", then a space with a + sign, which means "one or more spaces", then the $ means line ends with. So it matches only lines with only spaces

Cutting the top two lines after every six lines from a file containing 180 lines

I want to extract the top two lines from a file (a set of 180 lines) such that, if I group the file in to sets of 6-6 lines, I get the first two lines as my output. So I should be able to get the 1st,2nd followed by 7th,8th and so on. I tried using sed for this but not getting the desired output.
Could some one please suggest the logic to be implemented in here
My requirement is to make some modifications on the first two lines (like removing certain characters), for every set of 6 lines.
Example:
This is line command 1 for my configuration
This is line command 2 for my configuration
This is line command 3 for my configuration
This is line command 4 for my configuration
This is line command 5 for my configuration
This is line command 6 for my configuration
The output I want is:
This is line command 1
This is line command 2
This is line command 3 for my configuration
This is line command 4 for my configuration
This is line command 5 for my configuration
This is line command 6 for my configuration
This has to repeat for every 6 commands out of 180 commands.
You already have an answer from #fedorqui using awk. Here is an approach using sed.
sed -n '1~6,2~6p' inputfile
# Example
$ seq 60 | sed -n '1~6,2~6p'
1
2
7
8
13
14
19
20
25
26
31
32
37
38
43
44
49
50
55
56
You can do it using the modulus of the division of the line number / 6. If it is 1 or 2, then print the line. Otherwise, do not.
awk 'NR%6==1 || NR%6==2' file
NR stands for number of record, which in this case is "number of line" because the default record is a line. || stands for "or". Finally, it is not needed to write any print, because it is the default behaviour of awk.
Example:
$ seq 60 | awk 'NR%6==1 || NR%6==2'
1
2
7
8
13
14
19
20
25
26
31
32
37
38
43
44
49
50
55
56
Based on your update, this can make it:
$ awk 'NR%6==1 || NR%6==2 {$6=$7=$8=$9} 1' file
This is line command 1
This is line command 2
This is line command 3 for my configuration
This is line command 4 for my configuration
This is line command 5 for my configuration
This is line command 6 for my configuration
This is line command 7
This is line command 8
This is line command 9 for my configuration
This is line command 10 for my configuration
This is line command 11 for my configuration
This is line command 12 for my configuration
This is line command 13
This is line command 14
This is line command 15 for my configuration
This is line command 16 for my configuration
This is line command 17 for my configuration
This is line command 18 for my configuration
This is line command 19
This is line command 20
This is line command 21 for my configuration
This is line command 22 for my configuration
This is line command 23 for my configuration
This is line command 24 for my configuration

What is the difference between the following three sort commands in unix?

How are following sort commands in unix different?
1) sort -k1,4 < file
2) sort -k1,1 -k4,4 < file
3) sort -k1,1 -k2,2 -k3,3 -k4,4 < file
Especially, #1 and #2 are confusing.
For example, the following example illustrates my points
$ cat tmp
1 2 3 t
4 2 4 c
5 4 6 c
7 3 20 r
12 3 5 i
2 45 7 a
11 23 53 b
23 43 53 q
11 6 3 c
0 4 3 z
$ diff <(sort -k1,4 tmp) <(sort -k1,1 -k2,2 -k3,3 -k4,4 tmp)
1a2
> 1 2 3 t
5,6d5
< 1 2 3 t
< 23 43 53 q
7a7
> 23 43 53 q
$diff <(sort -k1,4 tmp) <(sort -k1,1 -k4,4 tmp)
1a2
> 1 2 3 t
5,6d5
< 1 2 3 t
< 23 43 53 q
7a7
> 23 43 53 q
And I did look at the sort's man page
In sort's man page, it says:
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line)
But I don't understand this explanation. If it starts from POS1 and end it at POS2, then shouln't #1 and #3 commands above produce the same results?
The difference is that #1 treats the entire line as a single key, and sorts it lexicographically. The other two have multiple keys, and in particular, while #3 uses the same set of fields as #1, it does so in a very different way. It first sorts the list by the first column (whitespace belongs to the following field, and is significant, unless you specify -b), and if two or more rows have an identical value in the first column, then it uses the second key to sort that subset of rows. If two or more rows are identical in the first two columns, it uses the third key, etc.
In your first case, depending on your locale, you can get different results (try LC_ALL=C sort -k1,4 < file and compare it to, for example, LC_ALL=en_US.utf8 sort -k1,4 < file).
In your second and third case, since the keys are split on transitions from non-whitespace to whitespace. This means the 2nd and following columns have varying sized whitespace prefixes, which affect sorting order, since you don't specify -b.
Also, if you have a mix of spaces and tabs for lining up your columns, that could be messing with things.
I got your same results when I had LC_ALL=en_US.utf8 in my environment, but your expected results (i.e. no differences) using LC_ALL=C (SuSE Enterprise 11.2).

Using unix join -o to not print the common field

I am using
join -1 2 -2 2 file1.txt file2.txt > file3.txt to join my two text files based on their second column and write them to file3.txt, which works perfectly. However, I do not want file3.txt to contain the common field. Googling and join's man page suggests that the -o formatting operator could help accomplish this, but how exactly should I go about doing so?
Assuming that each file only has two columns, and you want to join on the second column but show only the first columns of each file in your output, use
join -1 2 -2 2 -o 1.1,2.1 file1.txt file2.txt > file3.txt
Remember that your two files should be sorted on the second column before joining.
An example run:
$ cat file1.txt
2 1
3 2
7 2
8 4
2 6
$ cat file2.txt
3 1
5 4
9 9
$ join -1 2 -2 2 -o 1.1,2.1 file1.txt file2.txt
2 3
8 5

Resources