Prevent creating empty Tar file in AIX Unix - unix

I have a requirement to tar the files (in a single tar file)listed in a plain text file.
How can I prevent creating a tar file if the text file containing the file list is empty?

Depend of the method how file is created you can have 0, 1 or more lines. If 0 all is clear, you have no files:
l=$(cat filelist.txt|wc -l)
if [ "$l" -eq 0 ]
then echo "No files in the list";exit 1
fi
if 1 it can be only enter or it can be only one file. You can check it on this way:
l=$(cat filelist.txt|wc -l)
if [ "$l" -eq 1 ]
then if [ ! -e $(cat filelist.txt) ]
then echo "No files in the list";exit 1
fi
fi
If you want to make it on one line you can do something like:
tar xvf tarfile.tar `cat filelist.txt`|| rm tarfile.tar
or if you want to supress all the messages can be something like:
tar xvf tarfile.tar `cat filelist.txt` >/dev/null 2>&1|| rm tarfile.tar
This command will create tar file from filelist.txt and if something go wrong like empty list in file (or out of diskspace) will remove the tar file.

Related

Rsync skip folder based on wildcard

Script:
ash-4.4# cat rsync-backup.sh
#!/bin/sh
# Usage: rsync-backup.sh <src> <dst> <label>
if [ "$#" -ne 3 ]; then
echo "$0: Expected 3 arguments, received $#: $#" >&2
exit 1
fi
if [ -d "$2/__prev/" ]; then
rsync -azP --delete --link-dest="$2/__prev/" "$1" "$2/$3"
else
rsync -azP "$1" "$2/$3"
fi
rm -f "$2/__prev"
ln -s "$3" "$2/__prev"
How can I change this that it skip specific folders based on a wildcard?
This folder should be skipped always:
home/forge/*/storage/framework/cache/*
home/forge/*/vendor
home/forge/*/node_modules
But how can this be achieved? What to change in the original rsync-backup.sh file?
This is not working:
rsync -azP "$1" "$2/$3" --exclude={'node_modules', 'cache','.cache','.npm','vendor','.git'}
The --exclude={'dir1','dir2',...} does not work under sh shell. It works only under bash.
Your options are:
use bash, then the --exclude={'node_modules', 'cache','.cache','.npm','vendor','.git'} will work.
use multiple --exclude switches like: --exclude= statements. For example, rsync <params> --exclude='node_modules' --exclude='cache' --exclude='.cache' ...
use --exclude-from, where you have a text file with list of excluded directories. Like:
rsync <params> --exclude-from='/home/user/excluded_dir_list.txt' ...
The file excluded_dir_list.txt would contain one excluded dir for line like:
node_modules
cache
.cache
.npm
vendor
.git

Iterate ove directories and perform tasks within each directory

I hope someone can help me with a bash script that does the following:
Iterate over all directories in a path
In each directory a) rename a file with name starting with 'jpt' to the directory name, b) move the renamed file to parent directory, c) and then delete the directory.
So, basically I have some folders which have a file starting with 'jpt'. The file name is same in all the folders. I want to replace the folders with the files. Renaming of the files is to make them different.
thank you in advance!
Krishna
Here is a script that does what I understand :
#!/bin/dash
set -e
mvJtp() {
local fromDir="$1"
local f
for f in "$fromDir"/*
do if [ -d "$f" ]
then mvJtp "$f"
elif [ -f "$f" ]
then case "$f" in
"$fromDir"/jpt*)
mv -n "$f" "$fromDir".tmp
rmdir "$fromDir"
mv -n "$fromDir".tmp "$fromDir"
return 0
;;
esac
fi
done
}
mvJtp jptSrc

Unix Create Directories Based on File name and Move Files to the Directories

I'm trying to write a Unix script to create directories based on file names and move those files to the designated directories.
File pattern:
*PLAIN*nn.pdf (e.g. 4520009455604706_PLAIN_1221.pdf)
Directories to be created: Cynn (e.g. Cy21)
[NOTE: Need a step to check if directory exists, if not, then create new directory]
After creating the above directories, I need to move all files matching *PLAIN*21.pdf to the directory /Cy21.
[EDITED] Solution added below.
My solution is like this:
#!/bin/sh
for file in *.pdf
do
if test -s $file
then
cycle=`echo $file | awk -F'.' '{print $1}' | awk '{print substr($0,(length($0)-1))}'`
dir="./Cy"$cycle
if [ -d $dir ]
then
mv $file ./Cy$cycle
else
mkdir $dir
mv $file $dir
fi
else
echo "File error"
echo $file
fi
done

UNIX untar content into multiple folders

I have a tar.gz file about 13GB in size. It contains about 1.2 million documents. When I untar this all these files sit in one single directory & any reads from this directory takes ages. Is there any way I can split the files from the tar into multiple new folders?
e.g.: I would like to create new folders named [1,2,...] each having 1000 files.
This is a quick and dirty solution but it does the job in Bash without using any temporary files.
i=0 # file counter
dir=0 # folder name counter
mkdir $dir
tar -tzvf YOURFILE.tar.gz |
cut -d ' ' -f12 | # get the filenames contained in the archive
while read filename
do
i=$((i+1))
if [ $i == 1000 ] # new folder for every 1000 files
then
i=0 # reset the file counter
dir=$((dir+1))
mkdir $dir
fi
tar -C $dir -xvzf YOURFILE.tar.gz $filename
done
Same as a one liner:
i=0; dir=0; mkdir $dir; tar -tzvf YOURFILE.tar.gz | cut -d ' ' -f12 | while read filename; do i=$((i+1)); if [ $i == 1000 ]; then i=0; dir=$((dir+1)); mkdir $dir; fi; tar -C $dir -xvzf YOURFILE.tar.gz $filename; done
Depending on your shell settings the "cut -d ' ' -f12" part for retrieving the last column (filename) of tar's content output could cause a problem and you would have to modify that.
It worked with 1000 files but if you have 1.2 million documents in the archive, consider testing this with something smaller first.
Obtain filename list with --list
Make files containing filenames with grep
untar only these files using --files-from
Thus:
tar --list archive.tar > allfiles.txt
grep '^1' allfiles.txt > files1.txt
tar -xvf archive.tar --files-from=files1.txt
If you have GNU tar you might be able to make use of the --checkpoint and --checkpoint-action options. I have not tested this, but I'm thinking something like:
# UNTESTED
cd /base/dir
mkdir $(printf "dir%04d\n" {1..1500}) # probably more than you need
ln -s dest0 linkname
tar -C linkname ... --checkpoint=1000 \
--checkpoint-action='sleep=1' \
--checkpoint-action='exec=ln -snf dest%u linkname ...
you can look at the man page and see if there are options like that. worst comes to worst, just extract the files you need (maybe using --exclude ) and put them into your folders.
tar doesn't provide that capability directly. It only restores its files into the same structure from which it was originally generated.
Can you modify the source directory to create the desired structure there and then tar the tree? If not, you could untar the files as they are in the file and then post-process that directory using a script to move the files into the desired arrangement. Given the number of files, this will take some time but at least it can be done in the background.

Performing grep operation in tar files without extracting

I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files contain the pattern without extracting the files.
Any idea...?
the tar command has a -O switch to extract your files to standard output. So you can pipe those output to grep/awk
tar xvf test.tar -O | awk '/pattern/{print}'
tar xvf test.tar -O | grep "pattern"
eg to return file name one pattern found
tar tf myarchive.tar | while read -r FILE
do
if tar xf test.tar $FILE -O | grep "pattern" ;then
echo "found pattern in : $FILE"
fi
done
The command zgrep should do exactly what you want, directly.
for example
zgrep "mypattern" *.gz
http://linux.about.com/library/cmd/blcmdl1_zgrep.htm
GNU tar has --to-command. With it you can have tar pipe each file from the archive into the given command. For the case where you just want the lines that match, that command can be a simple grep. To know the filenames you need to take advantage of tar setting certain variables in the command's environment; for example,
tar xaf thing.tar.xz --to-command="awk -e '/thing.to.match/ {print ENVIRON[\"TAR_FILENAME\"] \":\", \$0}'"
Because I find myself using this often, I have this:
#!/bin/sh
set -eu
if [ $# -lt 2 ]; then
echo "Usage: $(basename "$0") <pattern> <tarfile>"
exit 1
fi
if [ -t 1 ]; then
h="$(tput setf 4)"
m="$(tput setf 5)"
f="$(tput sgr0)"
else
h=""
m=""
f=""
fi
tar xaf "$2" --to-command="awk -e '/$1/{gsub(\"$1\", \"$m&$f\"); print \"$h\" ENVIRON[\"TAR_FILENAME\"] \"$f:\", \$0}'"
This can be done with tar --to-command and grep --label:
tar xaf archive.tar.gz --to-command 'egrep -Hn --label="$TAR_FILENAME" your_pattern_here || true'
--label gives grep the filename
-H tells grep to display the filename, and -n the line number
|| true because otherwise grep will exit with an error if the pattern is not found, and tar will complain about that.
xaf means to extract, and automagically decompress based off the file extension
--to-command has tar pass each file in the tarfile to a separate invocation of grep, and sets various environment variables with info about the file. See the manpage for more info.
Pretty heavily based off of Chipaca's answer (and Daniel H's comment), but this should be a bit easier to use and just uses tar and grep.
Python's tarfile module along with Tarfile.extractfile() will allow you to inspect the tarball's contents without extracting it to disk.
The easiest way is probably to use avfs. I've used this before for such tasks.
Basically, the syntax is:
avfsd ~/.avfs # Sets up a avfs virtual filesystem
rgrep pattern ~/.avfs/path/to/file.tar#/
/path/to/file.tar is the path to the actual tar file.
Pre-pending ~/.avfs/ (the mount point) and appending # lets avfs expose the tar file as a directory.
That's actually very easy with ugrep option -z:
-z, --decompress
Decompress files to search, when compressed. Archives (.cpio,
.pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
.tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
matching pathnames of files in archives are output in braces. If
-g, -O, -M, or -t is specified, searches files within archives
whose name matches globs, matches file name extensions, matches
file signature magic bytes, or matches file types, respectively.
Supported compression formats: gzip (.gz), compress (.Z), zip,
bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
For example:
ugrep -z PATTERN archive.tgz
This greps each of the archived files to display PATTERN matches with the archived filenames. Archived filenames are shown in braces to distinguish them from ordinary filenames. Everything else is the same as grep (ugrep has the same options and produces the same output). For example:
$ ugrep -z "Hello" archive.tgz
{Hello.bat}:echo "Hello World!"
Binary file archive.tgz{Hello.class} matches
{Hello.java}:public class Hello // prints a Hello World! greeting
{Hello.java}: { System.out.println("Hello World!");
{Hello.pdf}:(Hello)
{Hello.sh}:echo "Hello World!"
{Hello.txt}:Hello
If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:
$ ugrep -z Hello -l --format="%z%~" archive.tgz
Hello.bat
Hello.class
Hello.java
Hello.pdf
Hello.sh
Hello.txt
Tarballs (.tar.gz/.tgz, .tar.bz2/.tbz, .tar.xz/.txz, .tar.lzma/.tlz) are searched as well as .zip archives.
You can mount the TAR archive with ratarmount and then simply search for the pattern in the mounted view:
pip install --user ratarmount
ratarmount large-archive.tar mountpoint
grep -r '<pattern>' mountpoint/
This should be much faster than iterating over each file and printing it to stdout, especially for compressed TARs.
Here is a simple comparison benchmark:
function checkFilesWithRatarmount()
{
local pattern=$1
local archive=$2
ratarmount "$archive" "$archive.mountpoint"
'grep' -r -l "$pattern" "$archive.mountpoint/"
}
function checkEachFileViaStdOut()
{
local pattern=$1
local archive=$2
tar --list --file "$archive" | while read -r file; do
if tar -x --file "$archive" -O -- "$file" | grep -q "$pattern"; then
echo "Found pattern in: $file"
fi
done
}
function createSampleTar()
{
for i in $( seq 40 ); do
head -c $(( 1024 * 1024 )) /dev/urandom | base64 > $i.dat
done
tar -czf "$1" [0-9]*.dat
}
createSampleTar myarchive.tar.gz
time checkEachFileViaStdOut ABCD myarchive.tar.gz
time checkFilesWithRatarmount ABCD myarchive.tar.gz
sleep 0.5s
fusermount -u myarchive.tar.gz.mountpoint
Results in seconds for a 55 MiB uncompressed and 42 MiB compressed TAR archive containing 40 files:
Compression
Ratarmount
Bash Loop over tar -O
none
0.31 +- 0.01
0.55 +- 0.02
gzip
1.1 +- 0.1
13.5 +- 0.1
bzip2
1.2 +- 0.1
97.8 +- 0.2
Of course, these results are highly dependent on the archive size and how many files the archive contains. These test examples are pretty small because I didn't want to wait too long but they already show the problem. The more files there are, the longer it takes for tar -O to jump to the correct file. And for compressed archives, it will be quadratically slower the larger the archive size is because everything before the requested file has to be decompressed and each file is requested separately. Both of these problems are solved by ratarmount.

Resources