Which archiving utility should I use in Ubuntu? - encryption

I am a Mac/Ubuntu user. I have folders such as "AWK", "awk", "awk_tip" and "awk_notes". I need to archive them, but the variety of utilities confuse me. I had a look at Tar, cpio and pax, but Git has started to fascinate me. I occasionally need encryption and backups.
Please, list the pros and cons of different archiving utilities.

Tar, cpio and pax are ancient Unix utilities. For instance, tar (which is probably the most common of these) was originally intended for making backups on tapes (hence the name, tar = tape archive).
The most commonly used archive formats today are:
tar (in Unix/Linux environments)
tar.gz or tgz (a gzip compressed tar file)
zip (in Windows environments)
If you want just one simple tool, take zip. It works right out of the box on most platforms, and it can be password protected (although the protection is technically weak).
If you need stronger protection (encryption), check out TrueCrypt. It is very good.

Under what OS / toolchain are you working? This might limit the range of existing solutions. Your name suggests Unix, but which one? Further, do you need portability or not?
The standard linux solution (at least to a newbie like me) might be to tar and gzip or bzip2 the folders, then encrypt them with gnupg if you really have to (encrypting awk tutorials seems a bit of overkill to me). You can also use full-fledged backup solutions like bacula, sync to a different location with rsync (perhaps sync to a backup server?).

If you've backing up directories from an ext2/ext3 filesystem, you may want to consider using dump. Some nice features:
it can backup a directory or a whole partition
it saves permissions and timestamps,
it allows you to do incremental backups,
it can compress (gzip or bzip2)
it will automatically split the archive into multiple parts based on a size-limit if you want
it will backup over a network or to a tape as well as a file
It doesn't support encryption, but you can always encrypt the dump files afterwards.

Related

rsync doesn't copy *only* modifications

I'm using rsync to backup my files. I choose rysnc because it (should) use modification times to determine if changes have been made and if files need to be updated.
I started my backup (from my computer system (debian) to a portable external hard drive) with this command:
rsync -avz --update --delete --stats --progress --exclude-from=/home/user/scripts/ExclusionRSync --backup --backup-dir=/media/user/hdd/backups/deleted-files /home/user/ /media/user/hdd/backups/backup_user
It worked well and took a lot of time. I believed the second time would be very quick (since I didn't modify files). Unfortunately, the 2nd, 3th, 4th, ... times took as long as the first one. I still see all my files being copied even if these files are already in my portable hard drive.
I don't understand why rsync doesn't copy only modifications (rsync is known to be efficient and only copy changes and I specificly call --update option).
A side effect of this problem is that all files are moved to my backup dir (deleted-filed) as soon as they are transfered. Indeed, rsync delete the previous file before to copy the same file during each update...
I found the solution reading an answer on Serverfault.SE. The Fat filesystem was messing with timestamps:
FAT doesn't track modification times on files as precisely as, say
ext3 (FAT is only precise to within a 2 second window). This leads to
particularly nasty behavior with rsync as it will sometimes decide
that the original files is newer or older than the backup file by
enough that it needs to re-copy the data or at least re-check the
hashes. All in all, it makes for very poor performance on backups. If
you must stick with FAT, look into rsync's --size-only and
--modify-window flags as workarounds.

Encrypting files added to Mercurial repositories on commit

Having read this past question for git, I would like to ask if there exists something like that, but
can be done programmatically (file list) on each machine;
works for Mercurial.
The reason for this is that I would like to include in my public dotfiles repository some configuration files that store password in plaintext. I know I could write a wraparound script for hg(1) but I would like to know if there are alternative approaches, just for the sake of curiosity.
Thank you.
You could use a pair of pre-commit and post-update hooks to encrypt/decrypt as necessary. See http://hgbook.red-bean.com/read/handling-repository-events-with-hooks.html for more details.
However, it's worth pointing out that if you're storing encrypted text in your repo you'll be unable to create meaningful diffs -- essentially everything will be like a binary file but also poorly compressible.
Mercurial has a filter system that lets you mangle files when they are read from the repository or written back. If you have a program like the SSH agent running that lets you do non-interactive encryption and decryption, then this might just be workable.
As Ryan points out, this will necessarily lead to a bigger repository since each encrypted version of your files will look completely different from the previous version. Mercurial detects this and stores the versions uncompressed (encrypted files cannot be compressed anyway). Since you will use this for dotfiles, you can ignore the space overhead, but it's something to take into consideration if you will be versioning bigger files in encrypted form.
Please post a mail to Mercurial mailing list with your experiences so that other users can benefit from them too.

Encryption for Folders

Is there a directory-encryption variant similar to VIM's "vim -x file"? I am looking for something like "mkdir -encrypt folder".
There is no "general" way to encrypt directories (ie, one that works across all file and operating systems) (see below).
You can, however (as Dante mentioned) use TrueCrypt to create an encrypted filesystem in a file, then mount ("attach", in Windows terminology?) that file.
If you're using Linux, you can even mount that file at a particular directory, to make it appear that the directory is encrypted.
If you want to know how to use TrueCrypt, checkout the docs for Windows here: http://www.truecrypt.org/docs/?s=tutorial and for Linux here: http://www.howtoforge.com/truecrypt_data_encryption (scroll down to the "TrueCrypt Download" heading).
So, a quick explanation why you can encrypt files but not directories:
As far as the "computer" (that is, the hardware, operating system, filesystem drivers, etc) is considered, "files" are just "a bunch of bits on disk" (in the same way a book is "just a bunch of ink on paper"). When a program reads from or writes to a file, it can read or write whatever the heck it wants -- so if that program wants to encrypt some data before writing it to the file, or read a file then decrypt the data that it reads, great.
Directories are a different story, though: to read (ie, list) or write (ie, create) directories, the program (be it, mkdir, ls, Windows Explorer or Finder) has to ask the operating systeme, then the operating system asks the filesystem driver "Hey, can you make the directory /foo/bar?" or "hey, can you tell me what's in /bar/baz?" -- all the program or operating system see (basically) is a function to make directories and a function to list the contents of a directory.
So, to encrypt a directory, you can see that it would have to be the filesystem driver that is doing the encryption, not the program creating/listing the directories... And no modern filesystems support per-directory encryption.
On Linux, the simplest way is probably to use EncFS
"EncFS provides an encrypted filesystem in user-space. It runs without any special permissions and uses the FUSE library and Linux kernel module to provide the filesystem interface."
it basically mounts an encrypted folder as a plain one.
More info on wikipedia
TrueCrypt Its open source and supports multiple types of encryption.. What operating system do you wish to know about?
Edit: Windows Vista/XP, Mac OS X, and Linux are all supported.
I would recommend Enterprise Cryptographic Filesystem i.e. ecryptfs found in apt-get as ecryptfs-utils in Debian/Ubuntu because more flexible than TrueCrypt.
It is probably one of the strongest way here to encrypt the directory.
It can be used with two passwords: login passhrase and password so making it a kind of double password system.
It is also POSIX implemented.
The limitation of this system like many other encryption systems is that it supports only filenames/directory names up to 144, in contrast to 255 Linux standard.
Maintained four years and last update 4 months ago so a good thing for future.
Comparison between TrueCrypt and encryptfs from this blog post
Truecrypt is simulated hardware encryption. It creates a virtual
encrypted hard disk, which your operating system can more or less
treat like an ordinary hard disk, but for the kernel hooks Truecrypt
adds to lock and unlock the disk. EcryptFS is an encrypted filesystem.
Unlike Truecrypt, which encrypts individual disk blocks, systems like
EcryptFS encrypt and decrypt whole files.
and more comparasion between the two systems here:
Those complications (and the fact that ecryptfs is slower) are part of
why people like block-level encryption like TrueCrypt, but I do
appreciate the flexibility of ecryptfs.

Keeping dot files synched across machines?

Like most *nix people, I tend to play with my tools and get them configured just the way that I like them. This was all well and good until recently. As I do more and more work, I tend to log onto more and more machines, and have more and more stuff that's configured great on my home machine, but not necessarily on my work machine, or my web server, or any of my work servers...
How do you keep these config files updated? Do you just manually copy them over? Do you have them stored somewhere public?
I've had pretty good luck keeping my files under a revision control system. It's not for everyone, but most programmers should be able to appreciate the benefits.
Read
Keeping Your Life in Subversion
for an excellent description, including how to handle non-dotfile configuration (like cron jobs via the svnfix script) on multiple machines.
I also use subversion to manage my dotfiles. When I login to a box my confs are automagically updated for me. I also use github to store my confs publicly. I use git-svn to keep the two in sync.
Getting up and running on a new server is just a matter of running a few commands. The create_links script just creates the symlinks from the .dotfiles folder items into my $HOME, and also touches some files that don't need to be checked in.
$ cd
# checkout the files
$ svn co https://path/to/my/dotfiles/trunk .dotfiles
# remove any files that might be in the way
$ .dotfiles/create_links.sh unlink
# create the symlinks and other random tasks needed for setup
$ .dotfiles/create_links.sh
It seems like everywhere I look these days I find a new thing that makes me say "Hey, that'd be a good thing to use DropBox for"
Rsync is about your best solution. Examples can be found here:
http://troy.jdmz.net/rsync/index.html
I use git for this.
There is a wiki/mailing list dedicated to the topic.
vcs-home
I would definetly recommend homesick. It uses git and automatically symlinks your files. homesick track tracks a new dotfile, while homesick symlink symlinks new dotfiles from the repository into your homefolder. This way you can even have more than one repository.
You could use rsync. It works through ssh which I've found useful since I only setup new servers with ssh access.
Or, create a tar file that you move around everywhere and unpack.
I store them in my version control system.
i use svn ... having a public and a private repository ... so as soon as i get on a server i just
svn co http://my.rep/home/public
and have all my dot files ...
I store mine in a git repository, which allows me to easily merge beyond system dependent changes, yet share changes that I want as well.
I keep master versions of the files under CM control on my main machine, and where I need to, arrange to copy the updates around. Fortunately, we have NFS mounts for home directories on most of our machines, so I actually don't have to copy all that often. My profile, on the other hand, is rather complex - and has provision for different PATH settings, etc, on different machines. Roughly, the machines I have administrative control over tend to have more open source software installed than machines I use occasionally without administrative control.
So, I have a random mix of manual and semi-automatic process.
There is netskel where you put your common files on a web server, and then the client program maintains the dot-files on any number of client machines. It's designed to run on any level of client machine, so the shell scripts are proper sh scripts and have a minimal amount of dependencies.
Svn here, too. Rsync or unison would be a good idea, except that sometimes stuff stops working and i wonder what was in my .bashrc file last week. Svn is a life saver in that case.
Now I use Live Mesh which keeps all my files synchronized across multiple machines.
I put all my dotfiles in to a folder on Dropbox and then symlink them to each machine. Changes made on one machine are available to all the others almost immediately. It just works.
Depending on your environment you can also use (fully backupped) NFS shares ...
Speaking about storing dot files in public there are
http://www.dotfiles.com/
and
http://dotfiles.org/
But it would be really painful to manually update your files as (AFAIK) none of these services provide any API.
The latter is really minimalistic (no contact form, no information about who made/owns it etc.)
briefcase is a tool to facilitate keeping dotfiles in git, including those with private information (such as .gitconfig).
By keeping your configuration files in a git public git repository, you can share your settings with others. Any secret information is kept in a single file outside the repository (it’s up to you to backup and transport this file).
-- http://jim.github.com/briefcase
mackup
https://github.com/lra/mackup
Ira/mackup is a utility for Linux & Mac systems that will sync application preferences using almost any popular shared storage provider (dropbox, icloud, google drive). It works by replacing the dot files with symlinks.
It also has a large library of hundreds of applications that are supported https://github.com/lra/mackup/tree/master/mackup/applications

What makes the Unix file system more superior to the Windows file system?

I'll admit that I don't know the inner workings of the unix operating system, so I was hoping someone could shed some light on this topic.
Why is the Unix file system better than the windows file system?
Would grep work just as well on Windows, or is there something fundamentally different that makes it more powerful on a Unix box?
e.g. I have heard that in a Unix system, the number of files in a given directory will not slow file access, while on Windows direct file access will degrade as the # of files increase in the given folder, true?
Updates:
Brad, no such thing as the unix file system?
One of the fundamental differences in filesystem semantics between Unix and Windows is the idea of inodes.
On Windows, a file name is directly attached to the file data. This means that the OS prevents somebody from deleting a file that is currently open. On some versions of Windows you can rename a file that is currently open, and on some versions you can't.
On Unix, a file name is a pointer to an inode, which is the place the file data is actually stored. This has a couple of implications:
You can have two different filenames that refer to the same underlying file. This is often called a hard link. There is only one copy of the file data, so changes made through one filename will appear in the other.
You can delete (also known as unlink) a file that is currently open. All that happens is the directory entry is removed, but this doesn't affect any other process that might still have the file open. The process with the file open hangs on to the inode, rather than to the directory entry. When the process closes the file, the OS deletes the inode because there are no more directory entries pointing at it and no more processes with the inode open.
This difference is important, but it is unrelated to things like the performance of grep.
First, there is no such thing as "the Unix file system".
Second, upon what premise does your argument rest? Did you hear someone say it was superior? Perhaps if you offered some source, we could critique the specific argument.
Edit: Okay, according to http://en.wikipedia.org/wiki/Comparison_of_file_systems, NTFS has more green boxes than both UFS1 and UFS2. If green boxes are your measure of "better", then NTFS is "better".
Still a stupid question. :-p
I think you are a little bit confused. There is no 'Unix' and 'Windows' file systems. The *nix family of filesystems include ext3, ZFS, UFS etc. Windows primarily has had support for FAT16/32 and their own filesystem NTFS. However today linux systems can read and write to NTFS. More filesystems here
I can't tell you why one could be better than the other though.
I'm not at all familiar with the inner workings of the UNIX file systems, as in how the bits and bytes are stored, but really that part is interchangeable (ext3, reiserfs, etc).
When people say that UNIX file systems are better, they might mean to be saying, "Oh ext3 stores bits in such as way that corruption happens way less than NTFS", but they might also be talking about design choices made at the common layer above. They might be referring to how the path of the file does not necessarily correspond to any particular device. For example, if you move your program files to a second disk, you probably have to refer to them as "D:\Program Files", while in UNIX /usr/bin could be a hard drive, a network drive, a CD ROM, or RAM.
Another possibility is that people are using "file system" to mean the organization of paths. Like, for instance, how Windows generally likes programs in "C:\Program Files\CompanyName\AppName" while a particular UNIX distribution might put most of them in /usr/local/bin. In the later case, you can access much more of your system readily from the command line with a much smaller PATH variable.
Also, since you mentioned grep, if all the source code for system libraries such as the kernel and libc is stored in /usr/local/src, doing a recursive grep for a particular error message coming from the guts of some system library is much simpler than if things were laid out as /usr/local/library-name/[bin|src|doc|etc]. If you already have an inkling of where you're searching, though, cygwin grep performs quite well under Windows. In fact, I find for full-text searching I get better results from grep than the search facilities built into Windows!
well the *nix filesystems do a far better job of actual file managment than fat16/32 or NTFS. The *nix systems try to prevent the need for a defrag over windows doing...nothing? Other than that I don't really know what would make one better than the other.
There are differences in how Windows and Unix operating systems expose the disk drives to users and how drive space is partitioned.
The biggest difference between the two operating systems is that Unix essentially treats all of the physical drives as one logical drive. (This isn't exactly how it works, but should give a good enough picture.) This allows a much simpler file system from the users perspective as there are no drive letters to deal with. I have a folder called /usr/bin that could span multiple physical drives. If I need to expand that partition I can do so by adding a new drive, remapping the folder, and moving the files. (Again, somewhat simplified, but it gets the point across.)
The other difference is that when you format a drive, a certain amount is set aside (by default, as an admin you can change the size to 0 if you want) for use by the "root" account (admin account) which allows an admin to almost always be able to log in to the machine even when the user has filled the disk and is receiving "out of disk space" messages.
One simple answer:
Windows is a proprietary which means no one can see it's code except windows, while unix/linux are open-source. So as it is open-source many brighter minds have contributed towards the filesystem making it one of the robust and efficient, hence effective commands like grep come to our rescue when needed truly.
I don't know enough about the guts of the file systems to answer the first, except when I read the first descriptions of NTFS it sounded an awful lot like the Berkley Fast Filesystem.
As for the second, there are plenty of greps for Windows. When I had to use Windows in the past, I always installed Cygwin first thing.
The answer turns out to have very little to do with the filesystem and everything to do with the filesystem access drivers.
In particular, the implementation of NTFS on Windows is very slow compared to ext2/ext3. Also on Windows, "can't delete file in use" even though NTFS should be able to support it.

Resources