An appendable compressed archive - unix

I have a requirement to maintain a compressed archive of log files. The log filenames are unique and the archive, once expanded, is simply one directory containing all the log files.
The current solution isn't scaling well, since it involves a gzipped tar file. Every time a log file is added, they first decompress the entire archive, add the file, and re-gzip.
Is there a Unix archive tool that can add to a compressed archive without completely expanding and re-compressing? Or can gzip perform this, given the right combination of arguments?

I'm using zip -Zb for that (appending text logs incrementally to compressed archive):
fast append (index is at the end of archive, efficient to update)
-Zb uses bzip2 compression method instead of deflate. In 2018 this seems safe to use (you'll need a reasonably modern unzip -- note some tools do assume deflate when they see a zip file, so YMMV)
7z was a good candidate: compression ratio is vastly better than zip when you compress all files in the same operation. But when you append files one by one to the archive (incremental appending), compression ratio is only marginally better than standard zip, and similar to zip -Zb. So for now I'm sticking with zip -Zb.
To clarify what happens and why having the index at the end is useful for "appendable" archive format, with entries compressed individually:
Before:
############## ########### ################# #
[foo1.png ] [foo2.png ] [foo3.png ] ^
|
index
After:
############## ########### ################# ########### #
[foo1.png ] [foo2.png ] [foo3.png ] [foo4.png ] ^
|
new index
So this is not fopen in append mode, but presumably fopen in write mode, then fseek, then write (that's my mental model of it, someone let me know if this is wrong). I'm not 100% certain that it would be so simple in reality, it might depend on OS and file system (e.g. a file system with snapshots might have a very different opinion about how to deal with small writes at the end of a file… huge "YMMV" here 🤷🏻‍♂️)

It's rather easy to have an appendable archive of compressed files (not same as appendable compressed archive, though).
tar has an option to append files to the end of an archive (Assuming that you have GNU tar)
-r, --append
append files to the end of an archive
You can gzip the log files before adding to the archive and can continue to update (append) the archive with newer files.
$ ls -l
foo-20130101.log
foo-20130102.log
foo-20130103.log
$ gzip foo*
$ ls -l
foo-20130101.log.gz
foo-20130102.log.gz
foo-20130103.log.gz
$ tar cvf backup.tar foo*gz
Now you have another log file to add to the archive:
$ ls -l
foo-20130104.log
$ gzip foo-20130104.log
$ tar rvf backup.tar foo-20130104.log
$ tar tf backup.tar
foo-20130101.log.gz
foo-20130102.log.gz
foo-20130103.log.gz
foo-20130104.log.gz

If you don't need to use tar, I suggest 7-Zip. It has an 'add' command, which I believe does what you want.
See related SO question: Is there a way to add a folder to existing 7za archive?
Also, the 7-Zip documentation: https://sevenzip.osdn.jp/chm/cmdline/commands/add.htm

Related

while loop to restrict the number of downloaded files continues beyond conditional parameters

I want to download a large number of ftp files. I made a list file that contain the links of the thousands of ftp. FTP download results in 'gbff.gz' files. Now say for some reason I want to restrict the number of downloaded files (with .gz extension) in current directory to 5.
To test that I made a while loop in R that use bash system command:
setwd("~/Desktop/test")
a<-system('find . | grep -i ".gz$" |wc -l')
while (a<5) {
system('wget -nc -tries=1 -i list.txt')
}
But seems like the while loop is not working. I mean, ideally it should break the loop when the number of .gz files in current directory is more than 5, but the download continues for all links in list file.
N.B.- My apologies for making such a hybrid script. As I already working in R it seems easy to me. I would also appreciate any alternate bash/awk/sed script if that is more suitable for this.
N.B2- FYI, I use -nc tag in wget, so a re-download of an already existing file should not occur.

unzip command ends successfully without unzipping file

There are double compressed files with extension xxx.zip.gz
On gunzip - xxx.zip file is created of size 0.25 GB
On unzip after gunzip - xxx.zip file extension does not change
Output of unzip :
Archive: xxx.zip
inflating: xxx.txt
also
echo $? shows 0
so, even though zip command completed successfully and still the file remains with zip extension , any help ?
OS - SunOS 5.10
You're finding the xxx.txt is being created, right?
unzip and gunzip have different "philosophies" about dealing with their archive. gunzip gets rid of the .gz file, while unzip leaves its zip file in place. So in your case, zip is working as designed.
I think the best you can do is
unzip -q xxx.zip && /bin/rm xxx.zip
This will only delete the zip file if unzip exits without error. The -q option makes unzip quiet, so you won't get the status messages you included above.
edit
as you asked when zip file itself is +10 GB in size, then unzip does not succeed
Assuming that you are certain there is enough diskspace to save the expanded orig file, then it's hard to say. How big is the expanded file? Over 2GB? SunOS5 I believe, used to have file-size limitation at 2GB, requiring a 'large-file' support to be added into kernel and utilities. I don't have access to Sun anymore so can't confirm. I think you'll find places to look with apropos largefile (assuming your $MANPATH is setup correctly).
But the basic test for did the unzip work correctly would be something like
if unzip "${file}" ; then
echo "clean unzip for ${file}, deleting the archive file" >&2
/bin/rm "${file}"
else
echo "error running unzip for ${file}, archive file remains in place" >&2
fi
(Or I don't understand your use case). Feel free to post another question showing ls -l xxx.zip.gz xxx.zip and other details to help reconstruct your expected workflow.
IHTH.

zip files of a certain date to zip file with batch

I'm trying to do a batch file to run every day. The batch needs to zip all the files with the current system date.
I have the following for now and it's working but is zipping all the files.
#echo off
"c:\Program Files\7-Zip\7z.exe" a -tzip "C:\test\location_zip\zip_file_%date:~6,4%-%date:~3,2%-%date:~0,2%" "C:\test\*.*"
pause
Can someone please help me to zip only the files created with system date instead all?
Many thanks,
I do not have a 7-zip solution. But with WinRAR this can be very easily achieved:
"%ProgramFiles%\WinRAR\WinRar.exe" a -ac -afzip -agYYYY-MM-DD -ao -cfg- -ed -ep1 -inul -m5 -r -tn24h "C:\test\location_zip\zip_file_" "C:\test\*.*"
a ... add files to archive.
-ac ... clear archive attribute after compression.
-afzip ... create a ZIP archive instead of a RAR archive.
-agYYYY-MM-DD ... append current date to archive file name in format YYYY-MM-DD.
-ao ... add only files with archive attribute set.
-cfg- ... ignore configuration file and RAR environment variable.
-ed ... do not add empty directories.
-ep1 ... exclude base directory from names, i.e. C:\test\
-inul ... disable all error messages.
-m5 ... use best compression method.
-r ... recurse subdirectories.
-tn24h ... add only files with a last modification date within last 24 hours (newer than 24 hours).
A single license of WinRAR is very cheap and includes unlimited upgrades. The few dollars/euros for a WinRAR license are invested well taking into account the time needed to code a batch file for archiving purposes with free 7-zip because of missing features which WinRAR has since more than 10 years. (I bought my WinRAR license in 1999 and have never needed any other compression tool.)

how to undo tar operation?

I used tar -cvf sample_directory/* and didn't specify file.tar.gz. So the Makefile within the folder is in some unreadable format. is there a way to recover my Makefile?
The Makefile within the folder contains the output from the tar command, so it's not "some unreadable format", it's gzipped tar format. that tar archive won't contain your missing Makefile though.
The comments about recovering the Makefile from your backups or from your version control system are apt. This is in fact what you need to do.
If you don't have a backup or the Makefile wasn't checked in to a version control system, then there isn't a feasible way to recover its contents.
Aside from the issue of your poor lost Makefile, a piece of advice about using tar: never tar up a bunch of individual files inside a directory. Always tar up the directory itself instead. There is not much more annoying than untarring an archive that contains a big bunch or files instead of a single directory (which then contains files). Doing that makes a mess by littering files all over the directory that happens to be the current directory. Please be nice to whoever is going to extract your tar files (which might be yourself, later on!), follow convention, and tar up complete directories.
tar -czf file.tar.gz sample_directory
As a bonus, if you do it that way, and you forget the output filename like this:
tar -czf sample_directory
You won't squash anything, you'll just get an error.

Add last n lines of files to tar/zip

I need to regularly send a collection of log files that can grow quite large, so I would like to only send the last n lines of the each of the files.
for example:
/usr/local/data_store1/file.txt (500 lines)
/usr/local/data_store2/file.txt (800 lines)
Given a file with a list of needed files named files.txt, I would like to create an archive (tar or zip) with the last 100 lines of each of those files.
I can do this by creating a separate directory structure with the tail-ed files, but that seems like a waste of resources when there's probably some piping magic that can happen to accomplish it. Full directory structure also must be preserved since files can have the same names in different directories.
I would like the solution to be a shell script if possible, but perl (without added modules) is also acceptable (this is for Solaris machines that don't have ruby/python/etc.. installed on them.)
You could try
tail -n 10 your_file.txt | while read line; do zip /tmp/a.zip $line; done
where a.zip is the zip file and 10 is n or
tail -n 10 your_file.txt | xargs tar -czvf test.tar.gz --
for tar.gz
You are focusing in an specific implementation instead of looking at the bigger picture.
If the final goal is to have an exact copy of the files on the target machine while minimizing the amount of data transfered, what you should use is rsync, which automatically sends only the parts of the files that have changed and also can automatically compress while sending and decompress while receiving.
Running rsync doesn't need any more daemons on the target machine that the standard sshd one, and to setup automatic transfers without passwords you just need to use public key authentication.
There is no piping magic for that, you will have to create the folder structure you want and zip that.
mkdir tmp
for i in /usr/local/*/file.txt; do
mkdir -p "`dirname tmp/${i:1}`"
tail -n 100 "$i" > "tmp/${i:1}"
done
zip -r zipfile tmp/*
Use logrotate.
Have a look inside /etc/logrotate.d for examples.
Why not put your log files in SCM?
Your receiver creates a repository on his machine from where he retrieves the files by checking them out.
You send the files just by commiting them. Only the diff will be transmitted.

Resources