Input output redirection UNIX - unix

Say I have a file called package.tar.gz
Then I do:
cat package.tar.gz | gzip -d | tar tvf -
and it shows me the list of files in my tar archive.
However if I do:
gzip -d package.tar.gz | tar tvf -
It says tar: This does not look like a tar archive
I don't understand why that is. If the result of gzip -d in the first case returns output which can be interpreted as a tar archive, why won't it work in the second case?
I have seen Autotools - tar This does not look like a tar archive but I'm not convinced that it's an issue with tar in my case since the first command works...

It looks to me like you're not passing the -d option in the second case. from the manpage,
Compressed files can be restored to their original form using gzip
-d or gunzip or zcat.
What's probably most appropriate for that style is zcat which is just what it sounds like - gunzip + cat.

The GNU tar will directly decompress the file:
tar -xf package.tar.gz
It automatically detects which decompressor to use (gzip, bzip2, xz, lzip, etc).
If your tar won't handle the decompressions, then gzip -cd decrypts to standard output:
gzip -cd package.tar.gz | tar -xf -
The -c option means read from standard input or write to standard output (in this case, write); the -d option means decrypt. You could also use gunzip -c in place of gzip -cd. This is 'standard' behaviour for compression programs.

Is this what you want to do?
gunzip -c package.tar.gz | tar xvf -
Or
gzip -cd package.tar.gz | tar xvf -

Basically,
gzip -d package.tar.gz
will not output to standard out, which
tar tvf -
expects. The result of
gzip -d package.tar.gz
is that the file is unzipped as a side effect. Need to use
gzip -dc package.tar.gz | tar tvf -
to get the desired effect.

Related

pdftk, copying files without taking comments and annotations

I have many PDF files which contain comments and annotations made with Adobe Acrobat Reader. However, it will take many hours to copy these files with the comment being deleted manually.
Does PDFtk provide commands to copy files without taking comments and annotations?
You can do this with:
cpdf -remove-annotations in.pdf -o out.pdf
One helpful solution is:
$ LC_CTYPE=C && LANG=C
$ pdftk in.pdf output - uncompress | sed '/^\/Annots/d' | pdftk - output out.pdf compress
The out.pdf has no comments and annotations.
Use bash to process on macOS:
LC_CTYPE=C && LANG=C
paperList=papers.txt
rm ${paperList}
ls | cat > ${paperList}
saveDir=../temp_without_annon
mkdir -p ${saveDir}
130 ↵
while IFS= read -r line
do
pdftk ${line} output - uncompress | sed '/^\/Annots/d' | pdftk - output ${saveDir}/${line} compress;
done < ${paperList}
References
How to install pdftk on Mac OS X
https://stackoverflow.com/a/49614525/5046896

Linux one line command to gzip and move

I have some .txt file in a particular /path/doc.txt and i wish to gzip all the files and move the new file that zipped all txt file into another path. How will i achieve that in one line of code.
maybe something like:
find /path/doc/ -type f -name \*.txt | xargs tar -z -c -f save.tar.gz && mv save.tar.gz other/path
use
tar -vtf save.tar.gz
to check archive content

combining gunzip and tar commands in Solaris and AIX

I am running the below command to untar a file in Solaris and AIX:
# gunzip /opt/myfile.tar.gz | tar -xvf-
but I'm getting this error:
tar: Unexpected end-of-file while reading from the storage media.
What do I need to fix?
Why should this work? The default behaviour of gunzip unpacks the file in place, substitutes the packed file with the unpacked one and you didn't specified the nescessary command to put the uncompressed datastream to stdout. So the tar command doesn't receive anything through the pipe to process and so you get the errormessage you have seen.
This will work:
gunzip -c ../myfile.tar.gz | tar -xfv -
This command line was tested on a Solaris 11.3 ... older variants of Solaris may need a different sorting of the command line like
gunzip -c ../myfile.tar.gz | tar -xvf -
I think something like this should work but I I don't have a Solaris system to test it...
gzip -dc /opt/myfile.tar.gz | tar xvf -

Unix Copy Recursive Including All Directories

I have the following two directories:
~/A
drawable/
imageb.png
new/`
newimage.png
~/B
drawable/
imagec.png
When I use the cp -r ~/A/* ~/B command newimage.png with its new/ folder is copied across to ~/B however imageb.png is not copied into ~/B/drawable.
Could you explain why this is the case and how I can get around this?
Use tar instead of cp:
(cd A ; tar cf - *) | (cd B ; tar xf -)
or more compactly (if you're using GNU tar):
tar cC A -f - . | tar xC B -f -
If you are on linux you can use the -r option.
eg: cp -r ~/A/. ~/B/
If you are on BSD you could use the -R option.
eg: cp -R ~/A/. ~/B/
For more information on exactly what option you should pass, refer man cp
Also note that, if you do not have permissions to the file you it would prevent copying files.

Performing grep operation in tar files without extracting

I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files contain the pattern without extracting the files.
Any idea...?
the tar command has a -O switch to extract your files to standard output. So you can pipe those output to grep/awk
tar xvf test.tar -O | awk '/pattern/{print}'
tar xvf test.tar -O | grep "pattern"
eg to return file name one pattern found
tar tf myarchive.tar | while read -r FILE
do
if tar xf test.tar $FILE -O | grep "pattern" ;then
echo "found pattern in : $FILE"
fi
done
The command zgrep should do exactly what you want, directly.
for example
zgrep "mypattern" *.gz
http://linux.about.com/library/cmd/blcmdl1_zgrep.htm
GNU tar has --to-command. With it you can have tar pipe each file from the archive into the given command. For the case where you just want the lines that match, that command can be a simple grep. To know the filenames you need to take advantage of tar setting certain variables in the command's environment; for example,
tar xaf thing.tar.xz --to-command="awk -e '/thing.to.match/ {print ENVIRON[\"TAR_FILENAME\"] \":\", \$0}'"
Because I find myself using this often, I have this:
#!/bin/sh
set -eu
if [ $# -lt 2 ]; then
echo "Usage: $(basename "$0") <pattern> <tarfile>"
exit 1
fi
if [ -t 1 ]; then
h="$(tput setf 4)"
m="$(tput setf 5)"
f="$(tput sgr0)"
else
h=""
m=""
f=""
fi
tar xaf "$2" --to-command="awk -e '/$1/{gsub(\"$1\", \"$m&$f\"); print \"$h\" ENVIRON[\"TAR_FILENAME\"] \"$f:\", \$0}'"
This can be done with tar --to-command and grep --label:
tar xaf archive.tar.gz --to-command 'egrep -Hn --label="$TAR_FILENAME" your_pattern_here || true'
--label gives grep the filename
-H tells grep to display the filename, and -n the line number
|| true because otherwise grep will exit with an error if the pattern is not found, and tar will complain about that.
xaf means to extract, and automagically decompress based off the file extension
--to-command has tar pass each file in the tarfile to a separate invocation of grep, and sets various environment variables with info about the file. See the manpage for more info.
Pretty heavily based off of Chipaca's answer (and Daniel H's comment), but this should be a bit easier to use and just uses tar and grep.
Python's tarfile module along with Tarfile.extractfile() will allow you to inspect the tarball's contents without extracting it to disk.
The easiest way is probably to use avfs. I've used this before for such tasks.
Basically, the syntax is:
avfsd ~/.avfs # Sets up a avfs virtual filesystem
rgrep pattern ~/.avfs/path/to/file.tar#/
/path/to/file.tar is the path to the actual tar file.
Pre-pending ~/.avfs/ (the mount point) and appending # lets avfs expose the tar file as a directory.
That's actually very easy with ugrep option -z:
-z, --decompress
Decompress files to search, when compressed. Archives (.cpio,
.pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
.tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
matching pathnames of files in archives are output in braces. If
-g, -O, -M, or -t is specified, searches files within archives
whose name matches globs, matches file name extensions, matches
file signature magic bytes, or matches file types, respectively.
Supported compression formats: gzip (.gz), compress (.Z), zip,
bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
For example:
ugrep -z PATTERN archive.tgz
This greps each of the archived files to display PATTERN matches with the archived filenames. Archived filenames are shown in braces to distinguish them from ordinary filenames. Everything else is the same as grep (ugrep has the same options and produces the same output). For example:
$ ugrep -z "Hello" archive.tgz
{Hello.bat}:echo "Hello World!"
Binary file archive.tgz{Hello.class} matches
{Hello.java}:public class Hello // prints a Hello World! greeting
{Hello.java}: { System.out.println("Hello World!");
{Hello.pdf}:(Hello)
{Hello.sh}:echo "Hello World!"
{Hello.txt}:Hello
If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:
$ ugrep -z Hello -l --format="%z%~" archive.tgz
Hello.bat
Hello.class
Hello.java
Hello.pdf
Hello.sh
Hello.txt
Tarballs (.tar.gz/.tgz, .tar.bz2/.tbz, .tar.xz/.txz, .tar.lzma/.tlz) are searched as well as .zip archives.
You can mount the TAR archive with ratarmount and then simply search for the pattern in the mounted view:
pip install --user ratarmount
ratarmount large-archive.tar mountpoint
grep -r '<pattern>' mountpoint/
This should be much faster than iterating over each file and printing it to stdout, especially for compressed TARs.
Here is a simple comparison benchmark:
function checkFilesWithRatarmount()
{
local pattern=$1
local archive=$2
ratarmount "$archive" "$archive.mountpoint"
'grep' -r -l "$pattern" "$archive.mountpoint/"
}
function checkEachFileViaStdOut()
{
local pattern=$1
local archive=$2
tar --list --file "$archive" | while read -r file; do
if tar -x --file "$archive" -O -- "$file" | grep -q "$pattern"; then
echo "Found pattern in: $file"
fi
done
}
function createSampleTar()
{
for i in $( seq 40 ); do
head -c $(( 1024 * 1024 )) /dev/urandom | base64 > $i.dat
done
tar -czf "$1" [0-9]*.dat
}
createSampleTar myarchive.tar.gz
time checkEachFileViaStdOut ABCD myarchive.tar.gz
time checkFilesWithRatarmount ABCD myarchive.tar.gz
sleep 0.5s
fusermount -u myarchive.tar.gz.mountpoint
Results in seconds for a 55 MiB uncompressed and 42 MiB compressed TAR archive containing 40 files:
Compression
Ratarmount
Bash Loop over tar -O
none
0.31 +- 0.01
0.55 +- 0.02
gzip
1.1 +- 0.1
13.5 +- 0.1
bzip2
1.2 +- 0.1
97.8 +- 0.2
Of course, these results are highly dependent on the archive size and how many files the archive contains. These test examples are pretty small because I didn't want to wait too long but they already show the problem. The more files there are, the longer it takes for tar -O to jump to the correct file. And for compressed archives, it will be quadratically slower the larger the archive size is because everything before the requested file has to be decompressed and each file is requested separately. Both of these problems are solved by ratarmount.

Resources