Why is '.' a hard link in Unix? [closed] - unix

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've seen many explanations for why the link count for an empty directory in Unix based OSes is 2 instead of 1. They all say that it's because of the '.' directory, which every directory has pointing back to itself. I understand why having some concept of '.' is useful for specifying relative paths, but what is gained by implementing it at the filesystem level? Why not just have shells or the system calls that take paths know how to interpret it?
That '..' is a real link makes much more sense to me -- the filesystem needs to store a pointer back to the parent directory in order to navigate to it. But I don't see why '.' being a real link is necessary. It also seems like it leads to an ugly special case in the implementation -- you would think you could only free the space used by inodes that have a link count less than 1, but if they're directories, you actually need to check for a link count less than 2. Why the inconsistency?

Why not just have shells or the system calls that take paths know how
to interpret it?
For transparency. If the filesystem does it the applications (and the myriad of system calls) don't have to do anything special with "." like "Oh, the user wants the current directory!". The notion of cwd and whatever it means is stored neatly out of the way at the FS level.
It also seems like it leads to an ugly special case in the
implementation -- you would think you could only free the space used
by inodes that have a link count less than 1, but if they're
directories, you actually need to check for a link count less than 2.
It's not a special case. All files in Unix have a number of links. Any file you unlink is checked "Is this the last link?". If it is, it gets the chop. If not, it lingers around.

(Hmm: the following is now a bit of an epic...)
The design of the directory on unix filesystems (which, to be pedantic, are typically but not necessarily attached to unix OSs) represents a wonderful insight, which actually reduces the number of special cases required.
A 'directory' is really just a file in the filesystem. All the actual content of files in the filesystem is in inodes (from your question, I can see that you're already aware of some of this stuff). There's no structure to the inodes on the disk -- they're just a big bunch of numbered blobs of bytes, spread like peanut-butter over the disk. This is not useful, and indeed is repellent to anyone with a shred of tidy-mindedness.
The only special inode is inode number 2 (not 0 or 1, for reasons of Tradition); inode 2 is a directory file: the root directory. When the system mounts the filesystem, it 'knows' it has to readdir inode 2, to get itself started.
A directory file is just a file, with an internal structure which is intended to be read by opendir(3) and friends. You can see its internal structure documented in dir(5) (depending on your OS); if you look at that, you'll see that the directory file entry contains almost no information about the file -- that's all in the file inode. One of the few things that's special about this file is that the open(2) function will given an error if you try to open a directory file with a mode which permits writing. Various other commands (to pick just one example, hexdump) will refuse to act in the normal way with directory files, just because that's probably not what you want to do (but that's their special case, not the filesystem's).
A hard link is nothing more nor less than an entry in a directory file's map. You can have two (or more) entries in such a map which both map to the same inode number: that inode therefore has two (or more) hard links. This also explains why every file has at least one 'hard link'. The inode has a reference count, which records how many times that inode is mentioned in a directory file somewhere in the filesystem (this is the number which you see when you do ls -l).
OK: we're getting to the point now.
The directory file is a map of strings ('filenames') to numbers (inode numbers). Those inode numbers are the numbers of the inodes of the files which are 'in' that directory. The files which are 'in' that directory might include other directory files, so their inode numbers will be amongst those listed in the directory. Thus, if you have a file /tmp/foo/bar, then the directory file foo includes an entry for bar, mapping that string to the inode for that file. There's also an entry in the directory file /tmp, for the directory file foo which is 'in' the directory /tmp.
When you create a directory with mkdir(2), that function
creates a directory file (with some inode number) with the correct internal structure,
adds an entry to the parent directory, mapping the new directory's name to this new inode (that accounts for one of the links),
adds an entry to the new directory, mapping the string '.' to the same inode (this accounts for the other link), and
adds another entry to the new directory, mapping the string '..' to the inode of the directory file it modified in step (2) (this accounts for the larger number of hard links you'll see on on directory files which contain subdirectories).
The end result is that (almost) the only special cases are:
The open(2) function tries to make it harder to shoot yourself in the foot, by preventing you opening directory files for writing.
The mkdir(2) function makes things nice and easy by adding a couple of extra entries ('.' and '..') to the new directory file, purely to make it convenient to move around the filesystem. I suspect that the filesystem would work perfectly well without '.' and '..', but would be a pain to use.
The directory file is one of the few types of files which are flagged as 'special' -- this is really what tells things like open(2) to behave slightly differently. See st_mode in stat(2).

Related

Linking in UNIX

I have a few questions regarding links in UNIX
Can I say Soft links in UNIX are analogous to shortcuts in windows?
Difference between copying and hard-linking?
Can anyone give me a use-case where I should prefer hard-linking over copying?
I'm so messed up right now. Any help is highly appreciated
I don't known much about shortcuts in windows, but I think it's similar but not the same. A soft link on file system level is basically a textfile with a pathname and a flag marking it as link. It can be any relative or absolute pathname anywhere on the machine. Normally any user process who opens that link is redirected by the kernel to the file it points to and doesn't even 'realize' it. Reading the link as link itself requires special system calls like readlink()
When you remove the file a soft link points to, the link remains but now points to 'nowhere' and can't be read anymore.
You can imagine a hard link as a second directory entry that's pointing to the same area on the file system as the 'original file' (more exactly: it points to the same inode that represents the location of the file and meta information like size, owner etc.). Having made a hard link 'original' and 'link' are indisdinguishable and if you change the file via one of the pathnames, you will see the changes via the other pathname as well. That doesn't apply for removing, as long as the link count (another value stored in the inode) is greater then 1 only the directory entry is removed and the link count is decremented.
Hard links can only be made within the same file system because every file system has it's own table of inodes.
That follows more or less from 2. If you want to use the special properties of hard links (or just save space in case of huge files) use a hard link otherwise do a copy

Is there an always existing, known Unix path string guaranteed to name an always empty directory?

Under many, most, or maybe all Unix file systems, if you iterate over the links in a directory, there will usually/always be at least two, one pointing to the current directory ("./"), and one back-pointing to the parent directory ("../"). Except maybe for the root, which would have only the first of these two links.
But it might be that this is not true under some other file systems that purport to comport with most Unix conventions (but don't quite).
Is there a directory somewhere in a Unix file system guaranteed to always be an empty directory and whose link count can always be read using, e.g., stat() or equivalent?
If so, one can check the link count and expect it to be 2. Or perhaps something else, which would allow a program to adjust its behavior accordingly.
There is no standard directory which is always empty -- but you could create one, if you needed to. One easy way to do this would be using the mkdtemp() function.
However, there is no guarantee that all directories will be using the same file system. For instance, if a FAT filesystem is mounted, directories corresponding to that file system may behave differently from other ones.

What are hardlinks on UNIX? How do they vary from copies? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I'm trying to get a basic understanding of the concept of "Hard and Soft Links" in Linux.
I came across several links online, and they all said that one must get an understanding of the Unix file-system before understanding what hard and soft links are.
I will first write what I got so far, mainly for you to correct me if I'm wrong...(I'm aware that I might over simplify things, this is for the sake of... well... simplifying things):
Every file (or folder) on a Unix file-system is identified not by its name, but by a number that represents a data-structure called an inode.
The inode contains information about the file it refers to, like permissions, size, location on the disk and more...
A directory (folder) in a Unix file-system, is actually no more than a list; a name-to-inode# mapping, of the files it contains. this means, that a file-name is separated from the file-content (the actual data on the disk).
So we could have something like this:
I also read, and this is where I get confused, that the name-inode map for a file isn't necessarily unique, meaning that a directory can contain two different file names that maps to the same inode number (and thus to the same file in the file system, and thus to the same actual content data on the disk), and two different directories may contain identical name-inode mappings (how is it different from "coping a folder with all its content"?) so using either pathname will lead to the same actual content on disk... How exactly can you name a file with two different names on linux, and what is it good for anyway?
btw, is there some linux tool that shows the information for a given inode?
how is it different from copying a folder with all its content?
Lets say you have an 8GB USB stick with an arbitrary folder which is 6GB in size. You want the folder to be accessible from 2 different locations e.g. /so_very_tired/shared_big_folder and /james/foobar but copying in both location wont fit of the USB stick. So you make a reference or symbolic link to the file/folder. The actual location for this folder can be anywhere, for example in /6GB_FOLDER
In order to do this so we create a symbolic link from /so_very_tired/shared_big_folder => /6GB_FOLDER and /james/foobar => /6GB_FOLDER
Now accessing the directory /so_very_tired/shared_big_folder and /james/foobar will lead to /6GB_FOLDER.
To find more information of files and indoes, open up a command prompt and type
$man ls
You can use ls -i to list the inode numbers for files
They are mostly used when you want to place a link to a file where the file doesn't exist e.g You have a configuration file for a server placed in /etc/nginx/sites-avaliable/ and in order to enable it you want to place it in the /etc/site-enabled folder but you don't want to make a copy. By creating a symbolic link all edits to the file in the sites-available folder are also made to the file pointed to by the symbolic link in sites-enabled. This is because one is the file and the other points the the same file.
To create a symbolic link use
ln -s /existing-file /symbolic-link

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

How does the UNIX kernel get an inode from a filename?

I am confused as to how the UNIX kernel gets an inode from a filename. Say I have the file "/usr/data/tmp/testme.txt". How would the kernel locate the inode for it?
Essentially, the whole path is taken apart into components and then walked from top down resolving directory entries and mount points. Cases of absolute and relative path differ slightly. This seems slow but kernel does a fair bit of caching for name lookup. Traditionally this was the namei() function in the VFS. You can try following the (admittedly pretty hairy) code for example here.
Each directory is stored as a file of records, and in that record there is the directory-local file name ("testme.txt") and the number of the inode.
[http://www.linuxquestions.org/questions/blog/mr-ameya-sathe-352399/inode-and-its-corresponding-filename-2126/]
ls -i
[http://www.cyberciti.biz/tips/understanding-unixlinux-filesystem-inodes.html]
the above is the link..
this post can be deleted or consolidated. sorry for the duplicate question.

Resources