How to get file name and total count in zip (Linux) - unix

How do I get result will have file names at beginning total #count of files. When I run a command line below only shows number of files but not file names.
I would like to have zip file name and count number of documents in zip. Thanks
the output:
IAD1.zip 30000 files
IAD2.zip 24000 files
IAD3.zip 32000 files
.....
command line
zipinfo IAD${count}.zip |grep ^-|wc -l >>TotalCount.txt
with command above the result show number of documents in zip files:
30000
24000
32000
.....

zipinfo -h <file name> | tr '\n' ':' | awk -F':' '{print $2 , $5 , "files"}'
explanation:
zipinfo -h -- list header line. The archive name, actual size (in bytes) and total number of files is printed.
tr '\n' ':' -- Replace new line with ":"
awk -F':' '{print $2 , $5 , "files"}' -- Read file as ":" delimited and print 2nd and 5th field
Demo:
:>zipinfo test.zip
Archive: test.zip
Zip file size: 2798 bytes, number of entries: 7
-rw-r--r-- 3.0 unx 18 tx stor 20-Mar-10 13:00 file1.dat
-rw-r--r-- 3.0 unx 32 tx defN 20-Mar-10 13:00 file2.dat
-rw-r--r-- 3.0 unx 16 tx stor 20-Mar-10 12:26 file3.dat
-rw-r--r-- 3.0 unx 1073 tx defN 20-Mar-12 05:24 join1.txt
-rw-r--r-- 3.0 unx 114 tx defN 20-Mar-12 05:25 join2.txt
-rw-r--r-- 3.0 unx 254 tx defN 20-Mar-11 09:39 sample.txt
-rw-r--r-- 3.0 unx 1323 bx stor 20-Mar-14 09:14 test,zip.zip
7 files, 2830 bytes uncompressed, 1746 bytes compressed: 38.3%
:>zipinfo -h test.zip | tr '\n' ':' | awk -F':' '{print $2 , $5 , "files"}'
test.zip 7 files

Related

grep and awk, combine commands?

I have file that looks like:
This is a RESTRICTED site.
All connections are monitored and recorded.
Disconnect IMMEDIATELY if you are not an authorized user!
sftp> cd outbox
sftp> ls -ltr
-rw------- 1 0 0 1911 Jun 12 20:40 61N0584832_EDIP000749728818_MFC_20190612203409.txt
-rw------- 1 0 0 1878 Jun 13 06:01 613577165_EDIP000750181517_MFC_20190613055207.txt
I want to print only the .txt file names, ideally in one command.
I can do:
grep -e '^-' outfile.log > outfile.log2
..which gives only the lines that start with '-'.
-rw------- 1 0 0 1911 Jun 12 20:40 61N0584832_EDIP000749728818_MFC_20190612203409.txt
-rw------- 1 0 0 1878 Jun 13 06:01 613577165_EDIP000750181517_MFC_20190613055207.txt
And then:
awk '{print $9}' outfile.log2 > outfile.log3
..which gives the desired output:
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt
And so the question is, can these 2 commands be combined into 1?
You may use a single awk:
awk '/^-/{ print $9 }' file > outputfile
Or
awk '/^-/{ print $9 }' file > tmp && mv tmp file
It works like this:
/^-/ - finds each line starting with -
{ print $9 } - prints Field 9 of the matching lines only.
Seems like matching the leading - is not really want you want. If you want to just get the .txt files as output, filter on the file name:
awk '$9 ~ /\.txt$/{print $9}' input-file
Using grep with PCRE enabled (-P) flag:
grep -oP '^-.* \K.*' outfile.log
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt
'^-.* \K.*' : Line starting with - till last white space are matched but ignored (anything left of \K will be matched and ignored) and matched part right of \K will be printed.
Since he clearly writes I want to print only the .txt file names, we should test for txt file and since file name are always the latest column we make it more portable by only test the latest filed line this:
awk '$NF ~ /\.txt$/{print $NF}' outfile.log > outfile.log2
61N0584832_EDIP000749728818_MFC_20190612203409.txt
613577165_EDIP000750181517_MFC_20190613055207.txt

How to compare html files in unix

I have two folders with huge number of HTML files and I want to compare each file and how to get the diff of each file using shell script/unix commands.
Example:
Directory 1:
1.html
2.html
3.html
Directory 2:
1.html
2.html
3.html..
I want to compare 1.html in directory with 1.html in dir2, and 2.html with 2.html, and so on.
try this;
#!/bin/bash
for file in $1/*.html; do
fileName=$(basename "$file")
if [ ! -f $2/$fileName ]; then
echo $fileName " not found! in "$2
else
difLineCount=$(diff $file $2/$fileName | wc -l)
if [ $difLineCount -eq 0 ]; then
echo $file "is same " $2/$fileName;
else
echo $file "is not same " $2/$fileName "." $difLineCount "lines are different";
#diff $file $2/$fileName
fi
fi
done
for file in $2/*.html; do
fileName=$(basename "$file")
if [ ! -f $1/$fileName ]; then
echo $fileName " not found! in "$1
fi
done
Ex :
user#host:/tmp$ ./test.sh Directory_1 Directory_2
Directory_1/1.html is same Directory_2/1.html
Directory_1/2.html is same Directory_2/2.html
Directory_1/3.html is not same Directory_2/3.html . 4 lines are different
4.html not found! in Directory_2
5.html not found! in Directory_1
user#host:/tmp$ ls -alrt Directory_1/
total 20
-rw-rw-r-- 1 user user 6 Ağu 11 13:28 1.html
-rw-rw-r-- 1 user user 6 Ağu 11 13:28 2.html
-rw-rw-r-- 1 user user 6 Ağu 11 13:28 3.html
-rw-rw-r-- 1 user user 0 Ağu 11 13:41 4.html
drwxrwxr-x 2 user user 4096 Ağu 11 13:41 .
drwxrwxr-x 4 user user 4096 Ağu 11 13:48 ..
user#host:/tmp$ ls -alrt Directory_2/
total 20
-rw-rw-r-- 1 user user 7 Ağu 11 13:28 3.html
-rw-rw-r-- 1 user user 6 Ağu 11 13:28 2.html
-rw-rw-r-- 1 user user 6 Ağu 11 13:28 1.html
-rw-rw-r-- 1 user user 0 Ağu 11 13:44 5.html
drwxrwxr-x 2 user user 4096 Ağu 11 13:44 .
drwxrwxr-x 4 user user 4096 Ağu 11 13:48 ..
You can use comm command to compare sorted files. This might be the actual solution you are looking for.
Syntax: comm file1 file2
Compare sorted files FILE1 and FILE2 line-by-line.
With no options, comm produces three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. Each of these columns can be suppressed individually with options.
Keep the comm command in a loop such that it compares all the files in your required directories.
If you want to compare files one by one manually:
You need to use diff command to display line-by-line difference between two files.
Syntax: diff path/FILE1 path/FILE2
You can use --changed-group-format and --unchanged-group-format options to filter required data.
Following three options can use to select the relevant group for each option:
'%<' get lines from FILE1
'%>' get lines from FILE2
'' (empty string) for removing lines from both files.
Example: diff --changed-group-format="%<" --unchanged-group-format="" file1 file2
You can get a clear-cut visual difference between two text files using the command sdiff:
Syntax: sdiff path/file1 path/file2
If you have vim editor installed
use: vim -d file1 file2

Format output of concatenating 2 variables in unix

I am coding a simple shell script that checks the space of the target path and the space utilization per directory on that target path (example, I am checking space of /path1/home, and also checks how all the folders on /path1/home is consuming the total space.) My question is regarding the output it produces, it is not that pleasing to the eye (uneven spacing). See sample output lines below.
SIZE USER_FOLDER DATE_LAST_MODIFIED
83G FOLDER 1 Apr 15 03:45
34G FOLDER 10 Mar 9 05:02
26G FOLDER 11 Mar 29 13:01
8.2G FOLDER 100 Apr 1 09:42
1.8G FOLDER 101 Apr 11 13:50
1.3G FOLDER 110 Feb 16 09:30
I just want the output format to be in line with the header so it will look neat because I will use it as a report. Here is the code I am using for this part.
ls -1 | grep -v "lost+found" |grep -v "email_body.tmp" > $v_path/Users.tmp
for user in `cat $v_path/Users.tmp | grep -v "Users.tmp"`
do
folder_size=`du -sh $user 2>/dev/null` # should be run using a more privileged user so that other folders can be read (2>/dev/null was used to discard error messages i.e. "du: cannot read directory `./marcnad/.gnupg': Permission denied")
folder_date=`ls -ltr | tr -s " " | cut -f6,7,8,9, -d" " | grep -w $user | cut -f1,2,3, -d" "`
folder_size="$folder_size $folder_date"
echo $folder_size >> $v_path/Users_Usage.tmp
done
echo "Summary of $v_path Disk Space Utilization per folder." >> email_body.tmp
echo "" >> email_body.tmp
echo "SIZE USER_FOLDER DATE_LAST_MODIFIED" >> email_body.tmp
for i in T G M K
do
cat $v_path/Users_Usage.tmp | grep [0-9]$i | sort -nr -k 1 >> $v_path/email_body.tmp
done
Thanks!
EDIT: Formatting
When you print the data use printf instead of echo
cat $v_path/Users_Usage.tmp | while read a b c d e f
do
printf '%-5s%-7%s%-4s%-4s%-3s-6s' $a $b $c $d $e $f
done
See here

How to find count of a particular word in Different Files in Unix

How Do i Find Count of a particular word in Different Files in Unix:
I have: 50 file in a Directory (abc.txt, abc.txt.1,abc.txt.2, etc)
What I want: To Find number of instances of word 'Hello' in each file.
What I have used is grep -c Hello abc* | grep -v :0
It gave me result in Form of,
<<File name>> : <<count>>
I want Output to be in a form
<<Date>> <<File_Name>> <<Number of Instances of word Hello in the file>>
1-1-2001 abc.txt 23
1-1-2014 abc.txt.19 57
2-5-2015 abc.txt.49 16
You can use gnu awk >=4.0 (due to ENDFILE) to get the number.
If we know where the data comes from, I will add it.
awk '{for (i=1;i<=NF;i++) if ($i~/Hello/) a++} ENDFILE {print FILENAME,a;a=0}' abc.txt*
### Sample code for you to tweak for your needs:
touch test.txt
echo "ravi chandran marappan 30" > test.txt
echo "ramesh kumar marappan 24" >> test.txt
echo "ram lakshman marappan 22" >> test.txt
sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1,
"""`grep -wc ",$1," test.txt`"}' | sh
Results:
22 -1
24 -1
30 -1
chandran -1
kumar -1
lakshman -1
marappan -3
ram -1
ramesh -1
ravi -1`

Can I use modified date and time as suffix in rsync?

I'm using rsync to synchronize folders, and make backups of existing files if they have been modified. Currently all modified files are backed to a separate directory, with the synchronization time as suffix. This is with the following command:
rsync --times --backup --backup-dir=OldVersions --suffix=`date +'.%y%m%d%H%M'` /SourceDir /DestDir
Now what I would like to do is use the modified date and time of each file that has to be backed up, instead of the time of the synchronization. Any ideas how I would be able to achieve this?
The approach below may work - it was tested on a Linux system.
Just run another script against the OldVersion diretories, after adding a suffix _ZZZZ to the end of the file name so that find can select it. Rename the file ending using the modify timestamp of the file.
Script to rename the files using their modification timestamp. rename.sh
#!/usr/bin/env bash
name=$1
# get file modification time, substitute space with underscore, and remove -,:
modtime=`stat $1 | grep Modify | cut -d ' ' -f 2,3 | sed -e 's/\ /_/g' -e 's/[-:]//g' `
#echo $modtime
newname=`echo $1 | sed -e "s/[[:digit:]]\{10\}_ZZZZ$/$modtime/g"`
#echo $newname
mv $1 $newname
Added the rsync options -r to recurse into directories and -i to show some information.
user1#debian10 /home/user1/test > rsync --backup-dir=OldVersions --suffix=`date +'.%y%m%d%H%M_ZZZZ'` -btri src dest
cd+++++++++ src/
>f+++++++++ src/azvltfexishlm.txt
>f+++++++++ src/dhatkfztklgcan.txt
>f+++++++++ src/feftafvfrdepwezl.txt
>f+++++++++ src/fwclodehxlpg.txt
>f+++++++++ src/ijcjftigjqhxhan.txt
>f+++++++++ src/jdlfsxoinuey.txt
>f+++++++++ src/oljmsfjv.txt
>f+++++++++ src/rbrktqqrtjyxyt.txt
>f+++++++++ src/rqheczjqrjulvlia.txt
>f+++++++++ src/ruyeizqrxstu.txt
>f+++++++++ src/ssuwndmrellunqyq.txt
>f+++++++++ src/vaclfgwqfdihmvis.txt
Ran again after I appended to the files below.
user1#debian10 /home/user1/test > rsync --backup-dir=OldVersions --suffix=`date +'.%y%m%d%H%M_ZZZZ'` -btri src dest
>f.st...... src/azvltfexishlm.txt
>f.st...... src/fwclodehxlpg.txt
>f.st...... src/ijcjftigjqhxhan.txt
>f.st...... src/ruyeizqrxstu.txt
Ran again after I appended to the files below
user1#debian10 /home/user1/test > rsync --backup-dir=OldVersions --suffix=`date +'.%y%m%d%H%M_ZZZZ'` -btri src dest
>f.st...... src/fwclodehxlpg.txt
>f.st...... src/rbrktqqrtjyxyt.txt
>f.st...... src/rqheczjqrjulvlia.txt
Using find to call the script on the backup files:
user1#debian10 /home/user1/test/dest > find . -name "*_ZZZZ" -print -exec ~/bin/rename.sh {} \;
./OldVersions/src/rqheczjqrjulvlia.txt.2110231948_ZZZZ
./OldVersions/src/azvltfexishlm.txt.2110231945_ZZZZ
./OldVersions/src/rbrktqqrtjyxyt.txt.2110231948_ZZZZ
./OldVersions/src/fwclodehxlpg.txt.2110231948_ZZZZ
./OldVersions/src/ruyeizqrxstu.txt.2110231945_ZZZZ
./OldVersions/src/ijcjftigjqhxhan.txt.2110231945_ZZZZ
./OldVersions/src/fwclodehxlpg.txt.2110231945_ZZZZ
./OldVersions/src/ssuwndmrellunqyq.txt.2110231948_ZZZZ
Listing the renamed files.
user1#debian10 /home/user1/test/dest > ls -lR OldVersions/
OldVersions/:
total 4
drwxr-xr-x 2 user1 user1 4096 Oct 23 20:24 src
OldVersions/src:
total 32
-rw-r--r-- 1 user1 user1 1322 Oct 23 19:42 azvltfexishlm.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 2255 Oct 23 19:42 fwclodehxlpg.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 3084 Oct 23 19:45 fwclodehxlpg.txt.20211023_194506.313540547
-rw-r--r-- 1 user1 user1 1178 Oct 23 19:42 ijcjftigjqhxhan.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 485 Oct 23 19:42 rbrktqqrtjyxyt.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 2283 Oct 23 19:42 rqheczjqrjulvlia.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 1579 Oct 23 19:42 ruyeizqrxstu.txt.20211023_194252.673598165
-rw-r--r-- 1 user1 user1 2091 Oct 23 19:42 ssuwndmrellunqyq.txt.20211023_194252.673598165
After each backup script run, then run the rename script. It should be idempotent.
Test scipt for file creation:
#!/usr/bin/env ruby
class Foo
def initialize()
end
def random_string(len)
pattern = "abcdefghijklmnopqrstuvwxyz"
accum = ""
if len == 0
return ""
end
(0..len-1).each do | i |
index = rand(0..25)
accum << pattern[index]
end
return accum
end
def append_data_to_file(fname)
fout = File.open(fname, "a")
num_lines = rand(5..30)
(1..num_lines).each do | lineno |
rand_string = random_string(rand(24..70))
fout.puts(rand_string)
end
fout.close
end
def create_rand_files(path)
Dir.chdir(path) do
num_files = rand(10..20)
puts "file count #{num_files}"
(1..num_files).each do | i |
name_len = rand(8..16)
rand_name = random_string(name_len) + ".txt"
puts "file #{i} name #{rand_name}"
append_data_to_file(rand_name)
end
end
end
def modify_files(path)
arr = Dir.entries(path).select { |x| x != "." && x != ".."}.shuffle
subsize = arr.length / 3
(1..subsize).each do
fname = arr.pop
puts fname
append_data_to_file(fname)
end
end
end
def main
begin
if ARGV.length != 2
puts "use: #{$0} path cmd"
exit
end
path,cmd = ARGV
b = Foo.new
case cmd
when "create"
b.create_rand_files(path)
when "modify"
b.modify_files(path)
else
puts "unknown command #{cmd}"
end
rescue StandardError => e
p e
p e.backtrace.inspect
end
end
main

Resources