compare two diff fields in two files - unix

I need to compare field1, field5 in fileA to field5, field6 in fileB
and print out when there are no matches:
file A
ZEROC_ZAR,MKT,M,ZAR,3YEAR,7.59
ZEROC_AED,MKT,M,ZAR,4YEAR,7.84
ZEROC_ZAR,MKT,M,ZAR,5YEAR,8.03
ZEROC_AED,MKT,M,ZAR,7YEAR,8.33
file B
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR

This oneliner prints the non-matching lines of fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort | uniq -u
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR
Explanation:
First combine fields 1 and 5 of fileA:
$ cut -d, -f1,5 fileA
ZEROC_ZAR,3YEAR
ZEROC_AED,4YEAR
ZEROC_ZAR,5YEAR
ZEROC_AED,7YEAR
Use these strings to grep for matching lines in fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
Then use cat - fileB | sort to combine these two lines with the content of fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR
Finally, use uniq -u to remove duplicate lines:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort | uniq -u
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR

Related

Unix command for list all the files by grouping and sorting by file type and name

There are lots of files in a directory and output to be group and sort like below,first exe files
without any file extension,then sql files ending with "body",then sql files ending with "spec",then
other sql files.then "sh" then "txt" files.
abc
1_spec.sql
1_body.sql
2_body.sql
other.sql
a1.sh
a1.txt
find . -maxdepth 1 -type f ! -name "*.*"
find . -type f -name "*body*.sql"
find . -type f -name "*spec*.sql"
Getting difficult to combine all and sorting group with order.
with ls, grep and sort you could do something like this script I hacked together:
#!/bin/sh
ls | grep -v '\.[a-zA-Z0-9]*$' | sort
ls | grep '_body.sql$' | sort
ls | grep '_spec.sql$' | sort
ls | grep -vE '_body.sql$|_spec.sql$' | grep '.sql$' | sort
ls | grep '.sh$' | sort
ls | grep '.txt$' | sort
normal ls:
$ ls -1
1_body.sql
1_spec.sql
2_body.sql
a1.sh
a1.txt
abc
bar.sql
def
foo.sh
other.sql
script
$
sorting script:
$ ./script
abc
def
script
1_body.sql
2_body.sql
1_spec.sql
bar.sql
other.sql
a1.sh
foo.sh
a1.txt
$

Can't add double quotes to file's directory

I need to get this result having this format :
"hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "
So I tried to use this instruction :
paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_","1ELPC",cat(" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "),sep = "")
But, this return
grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 [1] "hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1EPSE"
So, the problem is about using the cat function, In fact I need that its result will be in quoted format. In other way, I can't understand why the result was inversed here ?
I'm assuming you split up the arguments to paste0 for a specific reason. As #RuiBarradas mentions - cat is for printing, but not returning an actual object (always returns NULL):
paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_",
"1ELPC",
" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 ",
sep = "")
returns:
[1] "hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "
which looks to me like what you want.
Do note that, in the output \" is one character (a double quote). i.e.,
> nchar("\"")
[1] 1
To further illustrate the point:
temp <- paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_",
"1ELPC",
" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 ",
sep = "")
> cat(temp)
hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
> print(temp, quote = FALSE)
[1] hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8

awk to sort two fields:

Would like to sort Input.csv file based on fields $1 and $5 and generate country wise A-Z order.
While doing sort need to consider country name either from $1 or $5 if any of the fields are blank.
Input.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
,,,,mno,50,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
Desired Output.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
I have tried below command but not getting desired output. Please suggest..
head -1 Input.csv > Output.csv; sort -t, -k1,1 -k5,5 <(tail -n +2 Input.csv) >> Output.csv
awk to the rescue!
$ awk -F, '{print ($1==""?$5:$1) "\t" $0}' file | sort | cut -f2-
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
def,20,02-Aug-13,Aug,,,,,
def,20,02-Jul-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
here the header starting with uppercase and data is lowercase. If this is not a valid assumption special handling of header required as you did above or better with awk
$ awk -F, 'NR==1{print; next} {print ($1==""?$5:$1) "\t" $0 | "sort | cut -f2-"}' file
Is this what you want? (Omitted first line)
cat file_containing_your_lines | awk 'NR != 1' | sed "s/,/\t/g" | sort -k 1 -k 5 | sed "s/\t/,/g"

How to add a value/data to end of each row in Unix

I have fileA, fileB data as shown below
fileA
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
fileB
,,user2,location
,,user4,location
,,user1,location
,,user3,location
I want to search fileA user on fileB and get only location and add that one to fileA/or other file
Output expecting like
,,"user1","email",location
,,"user2","email",location
,,"user3","email",location
,,"user4","email",location
I'm trying the logic, using while get the fileA username and search that on fileB to get the location. but getting failed to add that with fileA back
Your help much appreciated
This should work:
for user in `awk -F\" '{print $2}' fileA`
do
loc=`grep ${user} fileB | awk -F',' '{print $4}'`
sed -i "/${user}/ s/$/,${loc}/" fileA
done
Adding the example:
$ cat fileA
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
$ cat fileB
,,user2,location2
,,user4,location4
,,user1,location1
,,user3,location3
$ for user in `awk -F\" '{print $2}' fileA`; do echo ${user}; loc=`grep ${user} fileB | awk -F',' '{print $4}'`; echo ${loc}; sed -i "/${user}/ s/$/,${loc}/" fileA; done
$ cat fileA
,,"user1","email",location1
,,"user2","email",location2
,,"user3","email",location3
,,"user4","email",location4
The description is not clear but based on the question you can use the following command to append a value/data to end of each row in Unix
sed -i '/search_pattern/ s/$/string_to_be_appended/' filename
You can do this entirely in awk
awk -F, '
NR==FNR{a[$3]=$4;next}
{for(x in a) if(index($3,x)>0) print $0","a[x]}' file2 file1
Test:
$ cat file1
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
$ cat file2
,,user2,location2
,,user4,location4
,,user1,location1
,,user3,location3
$ awk -F, 'NR==FNR{a[$3]=$4;next}{for(x in a) if(index($3,x)>0) print $0","a[x]}' file2 file1
,,"user1","email",location1
,,"user2","email",location2
,,"user3","email",location3
,,"user4","email",location4

Sorting file names by length of the file name

ls displays the files available in a directory. I want the file names to be displayed based on the length of the file name.
Any help will be highly appreciated.
Thanks in Advance
The simplest way is just:
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>'
You can do like this
for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n
make test files:
mkdir -p test; cd test
touch short-file-name medium-file-name loooong-file-name
the script:
ls |awk '{print length($0)"\t"$0}' |sort -n |cut --complement -f1
output:
short-file-name
medium-file-name
loooong-file-name
for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2-
TL;DR
Command:
find . -maxdepth 1 -type f -print0 | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | perl -F'/\0/' -ape '$_=join("\n", sort { length($b) <=> length($a) } #F)' | sed 's#/#/\\n/#g'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g'
Not Parsing ls Output AND Benchmarking
There are good answers here. However, if one wants to follow the advice not to parse the output of ls, here are some ways to get the job done. This will especially take care of the situation where you have spaces in filenames. I'm going to benchmark everything here as well as the paring-ls examples. (Hopefully I get to that, soon.) I've put a bunch of somewhat-random filenames that I've downloaded from different places over the last 25 years or so -- 73 to begin with. All 73 are 'normal' filenames, with only alphanumeric characters, underscores, dots, and hyphens. I'll add 2 more which I make now (in order to show problems with some sorts).
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ mkdir ../dir_w_fnames__spaces
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cp ./* ../dir_w_fnames__spaces/
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cd ../dir_w_fnames__spaces/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ touch "just one file with a really long filename that can throw off some counts bla so there"
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ mkdir ../dir_w_fnames__spaces_and_newlines
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cp ./* ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cd ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ touch $'w\nlf.aa'
This one, i.e. the filename,
w
lf.aa
stands for with linefeed - I make it like this to make it easier to see the problems. I don't know why I chose .aa as the file extension, other than the fact that it made this filename length easily visible in the sorts.
Now, I'm going back to the orig_dir_73 directory; just trust me that this directory only contains files. We'll use a surefire way to get the number of files.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ du --inodes
74 .
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ # The 74th inode is for the current directory, '.'; we have 73 files
There's a more surefire way, which doesn't depend on the directory only having files and doesn't require you to remember the extra '.' inode. I just looked through the man page, did some research, and did some experimentation. This command is
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
or, in more-readable fashion,
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
Let's find out how many files we have
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
73
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cd ../dir_w_fnames__spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
74
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cd ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
75
(See [ 1 ] for details and an edge case for a previous solution that led to the command here now.)
I'll be switching back and forth between these directories; just make sure you pay attention to the path - I won't note every switch.
* Usable even with weird filenames (containing spaces, linefeeds, etc.)
1a. Perl à la #tchrist with Additions
Using find with null separator. Hacking around newlines in a filename.
Command:
find . -maxdepth 1 -type f -print0 | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | perl -F'/\0/' -ape '$_=join("\n", sort { length($b) <=> length($a) } #F)' | sed 's#/#/\\n/#g'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g'
I'll actually show part of the sort results to show that the following command works. I'll also show how I'm checking that weird filenames aren't breaking anything.
Note that one wouldn't usually use head or tail if one wants the whole, sorted list (hopefully, it's not a sordid list). I'm using those commands for demonstration.
First, 'normal' filenames.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | tail -n 5
137.csv
13.csv
o6.dat
3.csv
a.dat
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ # No spaces in fnames, so...
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f | wc -l
73
Works for normal filenames
Next: spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
Works for filenames containing spaces
Next: newline
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | tail -8
Lk3f.png
LOqU.txt
137.csv
w/\n/lf.aa
13.csv
o6.dat
3.csv
a.dat
If you prefer, you can also change this command a bit, so the filename comes out with the linefeed "evaluated".
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#\n#g' | tail -8
LOqU.txt
137.csv
w
lf.aa
13.csv
o6.dat
3.csv
a.dat
In either case, you will know, due to what we've been doing, that the list is sorted, even though it doesn't appear so.
(Visual on not appearing sorted by filename length)
********
********
*******
********** <-- Visual Problem
*****
*****
****
****
OR
********
*******
* <-- Visual
**** <-- Problems
*****
*****
****
****
Works for filenames containing newlines
* 2a. Very Close, but Doesn't Keep Newline Filename Together - à la #cpasm
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2- | head
lf.aa
3.csv
a.dat
13.csv
o6.dat
137.csv
w
1UG5.txt
1uWj.txt
2Ese.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2- | tail -5
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
just one file with a really long filename that can throw off some counts bla so there
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
Note, for the head part, that the w in
w(\n)
lf.aa
is in the correct, sorted position for the 6-character-long filename that it is. However, the lf.aa is not in a logical place.
* Less-Easily Breakable (only '\n' and possibly command characters could be a problem)
1b. Perl à la #tchrist with find, not ls
Using find with null separator and xargs.
Command:
find . -maxdepth 1 -type f -print0 | xargs -I'{}' -0 echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | perl -e 'print sort { length($b) <=> length($a) } <>'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>'
Let's go for it.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
IKlT.txt
Lk3f.png
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
Works for normal filenames
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
Works for filenames containing spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' |
perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
lf.aa
w
WARNING
BREAKS for filenames containing newlines
1c. Good for normal filenames and filenames with spaces, but breakable with filenames containing newlines - à la #tchrist
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
lf.aa
w
3a. Good for normal filenames and filenames with spaces, but breakable with filenames containing newlines - à la #Peter_O
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | awk '{print length($0)"\t"$0}' | sort -n | cut --complement -f1 | head -n 8
w
3.csv
a.dat
lf.aa
13.csv
o6.dat
137.csv
1UG5.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | awk '{print length($0)"\t"$0}' | sort -n | cut --complement -f1 | tail -5
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
just one file with a really long filename that can throw off some counts bla so there
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
* More-Easily Breakable
4a. Good for normal filenames - à la #Raghuram
This version is breakable with filenames containing either spaces or newlines (or both).
I do want to add that I do like the display of the actual string length, if just for analysis purposes.
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n | head -n 20
1 a
1 w
2 so
3 bla
3 can
3 off
3 one
4 file
4 just
4 long
4 some
4 that
4 with
5 3.csv
5 a.dat
5 lf.aa
5 there
5 throw
6 13.csv
6 counts
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n | tail -5
69 17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
70 83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
76 79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
87 oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
238 68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
Explanation of Some Commands
For now, I'll only note that, with the works-for-all find command, I used '/' for the newline substitute because it is the only character that is illegal in a filename both on *NIX and Windows.
Note(s)
[ 1 ] The command used,
du --inodes --files0-from=<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=int($1)}END{print sum}'
will work in this case, because when there is a file with a newline, and therefore an "extra" line in the output of the find command, awk's int function will evaluate to 0 for the text of that link. Specifically, for our newline-containing filename, w\nlf.aa, i.e.
w
lf.aa
we will get
$ awk '{print int($1)}' < <(echo "lf.aa")
0
If you have a situation where the filename is something like
firstline\n3 and some other\n1\n2\texciting\n86stuff.jpg
i.e.
firstline
3 and some other
1
2 exciting
86stuff.jpg
well, I guess the computer has beaten me. If anyone has a solution, I'd be glad to hear it.
Edit I think I'm way too deep into this question. from this SO answer and experimentation, I got this command (I don't understand all the details, but I've tested it pretty well.)
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
More readably:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
You can use
ls --color=never --indicator-style=none | awk '{print length, $0}' |
sort -n | cut -d" " -f2-
To see it in action, create some files
% touch a ab abc
and some directories
% mkdir d de def
Output of the normal ls command
% ls
a ab abc d/ de/ def/
Output from the proposed command
% ls --color=never --indicator-style=none | awk '{print length, $0}' |
sort -n | cut -d" " -f2-
a
d
ab
de
abc
def

Resources