Compare two files and print only the unmatched values UNIX - unix

I have two files suppose 1st.dat
a
b
c
d
and another file sppose 2nd.dat
d
e
f
g
my output should be like
a
b
c
e
f
g
I have tried using diff and sdiff but I am not getting the output as I mentioned. Please Help

You can use grep
grep -vf 2nd.dat 1st.dat > out.dat && grep -vf 1st.dat 2nd.dat >> out.dat

try diff like this:-
diff 1st.dat 2nd.dat|grep -e '<' -e '>'|sed -e 's/< \+//g' -e 's/> \+//g' > output.txt
cat output.txt
a
b
c
e
f
g

I think your version of uniq doesn't support the option --unique.
Can't harm to try:
cat 1st.dat 2nd.dat | sort | uniq --unique
This would be much easier than
cat 1st.dat 2nd.dat |grep -vf <(sort 1st.dat 2nd.dat | uniq -d)
A solution with grep -vf will need some more attention (matching substrings, empty lines). When you go for grep -vf the solution of #Daniel is easier to understand and modify, and Daniel doesn't use a slow sort.

You can use comm command in linux, like this comm -3 1st.dat 2nd.dat.
It will give you the desired output.

diff -c file1 file2 | grep '^- \|^+ ' | tr '+' ' ' | tr '-' ' '
file 1:
a
b
c
d
file 2:
d
e
f
g
output
a
b
c
e
f
g

Related

Unix command for list all the files by grouping and sorting by file type and name

There are lots of files in a directory and output to be group and sort like below,first exe files
without any file extension,then sql files ending with "body",then sql files ending with "spec",then
other sql files.then "sh" then "txt" files.
abc
1_spec.sql
1_body.sql
2_body.sql
other.sql
a1.sh
a1.txt
find . -maxdepth 1 -type f ! -name "*.*"
find . -type f -name "*body*.sql"
find . -type f -name "*spec*.sql"
Getting difficult to combine all and sorting group with order.
with ls, grep and sort you could do something like this script I hacked together:
#!/bin/sh
ls | grep -v '\.[a-zA-Z0-9]*$' | sort
ls | grep '_body.sql$' | sort
ls | grep '_spec.sql$' | sort
ls | grep -vE '_body.sql$|_spec.sql$' | grep '.sql$' | sort
ls | grep '.sh$' | sort
ls | grep '.txt$' | sort
normal ls:
$ ls -1
1_body.sql
1_spec.sql
2_body.sql
a1.sh
a1.txt
abc
bar.sql
def
foo.sh
other.sql
script
$
sorting script:
$ ./script
abc
def
script
1_body.sql
2_body.sql
1_spec.sql
bar.sql
other.sql
a1.sh
foo.sh
a1.txt
$

Distorted output while reading in the fast appending logs using Tail in Unix

I am using a tail function to read close to 460 log files which keep appending all at the same time. The data I am trying to read is byte separated fixed width.Please find below the command I use:
find ###Directory### -mmin -2 -type f -name FileNameString*.log | xargs tail -qf -n -1
The expected format of log files is given below:
KS0A2018020723594007G58P5CNSSHAGPRGGWS G NH 0962201803061535PEK HND C 999 9 9CC91 990C 900 99
KS0A2018020723594007G58P5CNSSHAGPRGGWS G NH 5702201803060910PEK NRT C 444 0 4 0 40 00 44
but the format I see in the output is as below:
KS0A2018020723594912V1KY7USSCNTNPRAAPI P AA 3735201802111632IAH OR3903G7YI0HKSQUNAPRAAPI P AA 1583201812241935DEN DFW P 7 7 777777777 7 7 7 7
KS0A2018020723593952G56SCKRSGKORPRGFLCNG AZ 0758201809301515FCO ICN P07100007017070010 00 7007
The tail function is distorting the way files are being read.
Any guidance in reading the format right using tail or any equivalent command will greatly help.
You need the -z option for tail.
$ find /path/to/ -mmin -2 -type f -name FileNameString*.log | xargs tail -qf -z -n -1
-z, --zero-terminated
line delimiter is NUL, not newline
Better to use, -exec for find
$ find /path/to/ -mmin -2 -type f -name "FileNameString*.log" -exec tail -qf -z -n -1 {} \+
If we do an Xargs without assigning arguments to the Xargs, there will always be distortion in the output (as it tends to distort the line formation).
So,
find ###Directory### -mmin -2 -type f -name FileNameString*.log | xargs tail -qf -n -1
will always lead to distorted output as there is no controlled way to read in or write the output.
However, if we can pass the input variables to Xargs in a controlled manner by using -I, it was functioning well. In my case,
find ###Directory### -mmin -2 -type f -name FileNameString*.log | xargs -I% tail % -qf -n -1
produced the output format I expected. However, this can be a on the slower side of execution if the variable list to be passed to Xargs is lengthy.

Make each 2 rows a separated column in UNIX

Hello i have this file.txt
a=a
b=b
c=c
d=d
e=e
f=f
.
etc
(about 150 rows)
I need the output to be:
a b c d e f ....
a b c d e f ....
I already tried using paste -d " " - - < file.txt but i need something to work with huge number of rows to columns.
Thank you in advance.
Try this :
awk -F= '{
arr1[NR]=$1
arr2[NR]=$2
}
END{
for (i in arr1) {
printf("%s ", arr1[i])
}
print""
for (i in arr2) {
printf("%s ", arr2[i])
}
print ""
}' file
Output:
a b c d e f
a b c d e f
You can separate the file using the internal field separator:
while IFS== read -r left right; do echo $left; done < "test.txt" | xargs
This gives you the left side. For the right side, you could do
while IFS== read -r left right; do echo $right; done < "test.txt" | xargs
If you are talking about only 150 rows, scanning the file twice should be finde.
mash of echo, cut and tr
$ cat ip.txt
a=1
b=2
c=3
d=4
$ echo $(cut -d= -f1 ip.txt | tr '\n' ' ') ; echo $(cut -d= -f2 ip.txt | tr '\n' ' ')
a b c d
1 2 3 4

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

Sorting file names by length of the file name

ls displays the files available in a directory. I want the file names to be displayed based on the length of the file name.
Any help will be highly appreciated.
Thanks in Advance
The simplest way is just:
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>'
You can do like this
for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n
make test files:
mkdir -p test; cd test
touch short-file-name medium-file-name loooong-file-name
the script:
ls |awk '{print length($0)"\t"$0}' |sort -n |cut --complement -f1
output:
short-file-name
medium-file-name
loooong-file-name
for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2-
TL;DR
Command:
find . -maxdepth 1 -type f -print0 | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | perl -F'/\0/' -ape '$_=join("\n", sort { length($b) <=> length($a) } #F)' | sed 's#/#/\\n/#g'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g'
Not Parsing ls Output AND Benchmarking
There are good answers here. However, if one wants to follow the advice not to parse the output of ls, here are some ways to get the job done. This will especially take care of the situation where you have spaces in filenames. I'm going to benchmark everything here as well as the paring-ls examples. (Hopefully I get to that, soon.) I've put a bunch of somewhat-random filenames that I've downloaded from different places over the last 25 years or so -- 73 to begin with. All 73 are 'normal' filenames, with only alphanumeric characters, underscores, dots, and hyphens. I'll add 2 more which I make now (in order to show problems with some sorts).
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ mkdir ../dir_w_fnames__spaces
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cp ./* ../dir_w_fnames__spaces/
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cd ../dir_w_fnames__spaces/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ touch "just one file with a really long filename that can throw off some counts bla so there"
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ mkdir ../dir_w_fnames__spaces_and_newlines
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cp ./* ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cd ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ touch $'w\nlf.aa'
This one, i.e. the filename,
w
lf.aa
stands for with linefeed - I make it like this to make it easier to see the problems. I don't know why I chose .aa as the file extension, other than the fact that it made this filename length easily visible in the sorts.
Now, I'm going back to the orig_dir_73 directory; just trust me that this directory only contains files. We'll use a surefire way to get the number of files.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ du --inodes
74 .
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ # The 74th inode is for the current directory, '.'; we have 73 files
There's a more surefire way, which doesn't depend on the directory only having files and doesn't require you to remember the extra '.' inode. I just looked through the man page, did some research, and did some experimentation. This command is
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
or, in more-readable fashion,
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
Let's find out how many files we have
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
73
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ cd ../dir_w_fnames__spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
74
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ cd ../dir_w_fnames__spaces_and_newlines/
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
75
(See [ 1 ] for details and an edge case for a previous solution that led to the command here now.)
I'll be switching back and forth between these directories; just make sure you pay attention to the path - I won't note every switch.
* Usable even with weird filenames (containing spaces, linefeeds, etc.)
1a. Perl à la #tchrist with Additions
Using find with null separator. Hacking around newlines in a filename.
Command:
find . -maxdepth 1 -type f -print0 | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | perl -F'/\0/' -ape '$_=join("\n", sort { length($b) <=> length($a) } #F)' | sed 's#/#/\\n/#g'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g'
I'll actually show part of the sort results to show that the following command works. I'll also show how I'm checking that weird filenames aren't breaking anything.
Note that one wouldn't usually use head or tail if one wants the whole, sorted list (hopefully, it's not a sordid list). I'm using those commands for demonstration.
First, 'normal' filenames.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | tail -n 5
137.csv
13.csv
o6.dat
3.csv
a.dat
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ # No spaces in fnames, so...
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f | wc -l
73
Works for normal filenames
Next: spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
Works for filenames containing spaces
Next: newline
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#/\\n/#g' | tail -8
Lk3f.png
LOqU.txt
137.csv
w/\n/lf.aa
13.csv
o6.dat
3.csv
a.dat
If you prefer, you can also change this command a bit, so the filename comes out with the linefeed "evaluated".
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
sed 's#\./.*/\([^/]\+\)\./$#\1#g' | tr '\n' '/' | \
perl -F'/\0/' -ape \
'$_=join("\n", sort { length($b) <=> length($a) } #F)' | \
sed 's#/#\n#g' | tail -8
LOqU.txt
137.csv
w
lf.aa
13.csv
o6.dat
3.csv
a.dat
In either case, you will know, due to what we've been doing, that the list is sorted, even though it doesn't appear so.
(Visual on not appearing sorted by filename length)
********
********
*******
********** <-- Visual Problem
*****
*****
****
****
OR
********
*******
* <-- Visual
**** <-- Problems
*****
*****
****
****
Works for filenames containing newlines
* 2a. Very Close, but Doesn't Keep Newline Filename Together - à la #cpasm
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2- | head
lf.aa
3.csv
a.dat
13.csv
o6.dat
137.csv
w
1UG5.txt
1uWj.txt
2Ese.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in *; do printf "%d\t%s\n" "${#i}" "$i"; done | sort -n | cut -f2- | tail -5
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
just one file with a really long filename that can throw off some counts bla so there
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
Note, for the head part, that the w in
w(\n)
lf.aa
is in the correct, sorted position for the 6-character-long filename that it is. However, the lf.aa is not in a logical place.
* Less-Easily Breakable (only '\n' and possibly command characters could be a problem)
1b. Perl à la #tchrist with find, not ls
Using find with null separator and xargs.
Command:
find . -maxdepth 1 -type f -print0 | xargs -I'{}' -0 echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | perl -e 'print sort { length($b) <=> length($a) } <>'
Alternate version of command that's easier to read:
find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>'
Let's go for it.
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
bballdave025#MY-MACHINE /home/bballdave025/orig_dir_73
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
IKlT.txt
Lk3f.png
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
Works for normal filenames
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
Works for filenames containing spaces
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' | \
perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ find . -maxdepth 1 -type f -print0 | \
xargs -I'{}' -0 \
echo "{}" | sed 's#\./.*/\([^/]\+\)\./$#\1#g' |
perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
lf.aa
w
WARNING
BREAKS for filenames containing newlines
1c. Good for normal filenames and filenames with spaces, but breakable with filenames containing newlines - à la #tchrist
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>' | head -n 5
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
just one file with a really long filename that can throw off some counts bla so there
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | perl -e 'print sort { length($b) <=> length($a) } <>' | tail -8
LOqU.txt
137.csv
13.csv
o6.dat
3.csv
a.dat
lf.aa
w
3a. Good for normal filenames and filenames with spaces, but breakable with filenames containing newlines - à la #Peter_O
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | awk '{print length($0)"\t"$0}' | sort -n | cut --complement -f1 | head -n 8
w
3.csv
a.dat
lf.aa
13.csv
o6.dat
137.csv
1UG5.txt
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ ls | awk '{print length($0)"\t"$0}' | sort -n | cut --complement -f1 | tail -5
83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
just one file with a really long filename that can throw off some counts bla so there
oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
* More-Easily Breakable
4a. Good for normal filenames - à la #Raghuram
This version is breakable with filenames containing either spaces or newlines (or both).
I do want to add that I do like the display of the actual string length, if just for analysis purposes.
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n | head -n 20
1 a
1 w
2 so
3 bla
3 can
3 off
3 one
4 file
4 just
4 long
4 some
4 that
4 with
5 3.csv
5 a.dat
5 lf.aa
5 there
5 throw
6 13.csv
6 counts
bballdave025#MY-MACHINE /home/bballdave025/dir_w_fnames__spaces_and_newlines
$ for i in `ls`; do LEN=`expr length $i`; echo $LEN $i; done | sort -n | tail -5
69 17f09d51d6280fb8393d5f321f344f616c461a57a8b9cf9cc3099f906b567c992.txt
70 83dfee2e0f8560dbd2a681a5a40225fd260d3b428b962dcfb75d17e43a5fdec9_1.txt
76 79496d6167652f71526c586e654345744a365939773d3d2d343538sfp6m8o1m53hlwlfja.dat
87 oinwrxK2ea1sfp6m8o49255f679496d6167652f71526c586e654345744a365939773d3d2d343538b3e0.csv
238 68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f71526c586e654345744a365939773d3d2d3435383139353437362e313464633462356336326266656365303439363432373931333139382e676966.txt
Explanation of Some Commands
For now, I'll only note that, with the works-for-all find command, I used '/' for the newline substitute because it is the only character that is illegal in a filename both on *NIX and Windows.
Note(s)
[ 1 ] The command used,
du --inodes --files0-from=<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=int($1)}END{print sum}'
will work in this case, because when there is a file with a newline, and therefore an "extra" line in the output of the find command, awk's int function will evaluate to 0 for the text of that link. Specifically, for our newline-containing filename, w\nlf.aa, i.e.
w
lf.aa
we will get
$ awk '{print int($1)}' < <(echo "lf.aa")
0
If you have a situation where the filename is something like
firstline\n3 and some other\n1\n2\texciting\n86stuff.jpg
i.e.
firstline
3 and some other
1
2 exciting
86stuff.jpg
well, I guess the computer has beaten me. If anyone has a solution, I'd be glad to hear it.
Edit I think I'm way too deep into this question. from this SO answer and experimentation, I got this command (I don't understand all the details, but I've tested it pretty well.)
awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'
More readably:
awk -F"\0" '{print NF-1}' < \
<(find . -maxdepth 1 -type f -print0) | \
awk '{sum+=$1}END{print sum}'
You can use
ls --color=never --indicator-style=none | awk '{print length, $0}' |
sort -n | cut -d" " -f2-
To see it in action, create some files
% touch a ab abc
and some directories
% mkdir d de def
Output of the normal ls command
% ls
a ab abc d/ de/ def/
Output from the proposed command
% ls --color=never --indicator-style=none | awk '{print length, $0}' |
sort -n | cut -d" " -f2-
a
d
ab
de
abc
def

Resources