I am confused as to how the UNIX kernel gets an inode from a filename. Say I have the file "/usr/data/tmp/testme.txt". How would the kernel locate the inode for it?
Essentially, the whole path is taken apart into components and then walked from top down resolving directory entries and mount points. Cases of absolute and relative path differ slightly. This seems slow but kernel does a fair bit of caching for name lookup. Traditionally this was the namei() function in the VFS. You can try following the (admittedly pretty hairy) code for example here.
Each directory is stored as a file of records, and in that record there is the directory-local file name ("testme.txt") and the number of the inode.
[http://www.linuxquestions.org/questions/blog/mr-ameya-sathe-352399/inode-and-its-corresponding-filename-2126/]
ls -i
[http://www.cyberciti.biz/tips/understanding-unixlinux-filesystem-inodes.html]
the above is the link..
this post can be deleted or consolidated. sorry for the duplicate question.
Related
I have a few questions regarding links in UNIX
Can I say Soft links in UNIX are analogous to shortcuts in windows?
Difference between copying and hard-linking?
Can anyone give me a use-case where I should prefer hard-linking over copying?
I'm so messed up right now. Any help is highly appreciated
I don't known much about shortcuts in windows, but I think it's similar but not the same. A soft link on file system level is basically a textfile with a pathname and a flag marking it as link. It can be any relative or absolute pathname anywhere on the machine. Normally any user process who opens that link is redirected by the kernel to the file it points to and doesn't even 'realize' it. Reading the link as link itself requires special system calls like readlink()
When you remove the file a soft link points to, the link remains but now points to 'nowhere' and can't be read anymore.
You can imagine a hard link as a second directory entry that's pointing to the same area on the file system as the 'original file' (more exactly: it points to the same inode that represents the location of the file and meta information like size, owner etc.). Having made a hard link 'original' and 'link' are indisdinguishable and if you change the file via one of the pathnames, you will see the changes via the other pathname as well. That doesn't apply for removing, as long as the link count (another value stored in the inode) is greater then 1 only the directory entry is removed and the link count is decremented.
Hard links can only be made within the same file system because every file system has it's own table of inodes.
That follows more or less from 2. If you want to use the special properties of hard links (or just save space in case of huge files) use a hard link otherwise do a copy
Under many, most, or maybe all Unix file systems, if you iterate over the links in a directory, there will usually/always be at least two, one pointing to the current directory ("./"), and one back-pointing to the parent directory ("../"). Except maybe for the root, which would have only the first of these two links.
But it might be that this is not true under some other file systems that purport to comport with most Unix conventions (but don't quite).
Is there a directory somewhere in a Unix file system guaranteed to always be an empty directory and whose link count can always be read using, e.g., stat() or equivalent?
If so, one can check the link count and expect it to be 2. Or perhaps something else, which would allow a program to adjust its behavior accordingly.
There is no standard directory which is always empty -- but you could create one, if you needed to. One easy way to do this would be using the mkdtemp() function.
However, there is no guarantee that all directories will be using the same file system. For instance, if a FAT filesystem is mounted, directories corresponding to that file system may behave differently from other ones.
When I create directory, and type ls -l. It will show 2 links while I create file and check the long listing via ls -l command then it only show 1 link. Can anyone tell me the reason behind it....
Long listing of Home directory
Over here you can see file(e.txt) has 1 link while directory (amit) has 2 link....
I always understood the extra link to be due to the "." entry that is created automatically when a directory is created. That is effectively a hard link to the directory.
I'm not sure, but I think this is a homework question in Maurice Bach's book. In older versions of Unix, there was no mkdir(2) system call. You had to mknod() (one link) and then make 2 additional links: one from "." to the new node (the second link), and then link ".." to the parent node (changing the parent's link count). Hence, 2 links per initial directory. I can't be sure about the exact book ("The Design of the UNIX Operating System"?), but that's why directories on Unix-like file systems have at least 2 links. It's also why they added the mkdir() system call; the earlier 3-step process was tedious and error prone.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've seen many explanations for why the link count for an empty directory in Unix based OSes is 2 instead of 1. They all say that it's because of the '.' directory, which every directory has pointing back to itself. I understand why having some concept of '.' is useful for specifying relative paths, but what is gained by implementing it at the filesystem level? Why not just have shells or the system calls that take paths know how to interpret it?
That '..' is a real link makes much more sense to me -- the filesystem needs to store a pointer back to the parent directory in order to navigate to it. But I don't see why '.' being a real link is necessary. It also seems like it leads to an ugly special case in the implementation -- you would think you could only free the space used by inodes that have a link count less than 1, but if they're directories, you actually need to check for a link count less than 2. Why the inconsistency?
Why not just have shells or the system calls that take paths know how
to interpret it?
For transparency. If the filesystem does it the applications (and the myriad of system calls) don't have to do anything special with "." like "Oh, the user wants the current directory!". The notion of cwd and whatever it means is stored neatly out of the way at the FS level.
It also seems like it leads to an ugly special case in the
implementation -- you would think you could only free the space used
by inodes that have a link count less than 1, but if they're
directories, you actually need to check for a link count less than 2.
It's not a special case. All files in Unix have a number of links. Any file you unlink is checked "Is this the last link?". If it is, it gets the chop. If not, it lingers around.
(Hmm: the following is now a bit of an epic...)
The design of the directory on unix filesystems (which, to be pedantic, are typically but not necessarily attached to unix OSs) represents a wonderful insight, which actually reduces the number of special cases required.
A 'directory' is really just a file in the filesystem. All the actual content of files in the filesystem is in inodes (from your question, I can see that you're already aware of some of this stuff). There's no structure to the inodes on the disk -- they're just a big bunch of numbered blobs of bytes, spread like peanut-butter over the disk. This is not useful, and indeed is repellent to anyone with a shred of tidy-mindedness.
The only special inode is inode number 2 (not 0 or 1, for reasons of Tradition); inode 2 is a directory file: the root directory. When the system mounts the filesystem, it 'knows' it has to readdir inode 2, to get itself started.
A directory file is just a file, with an internal structure which is intended to be read by opendir(3) and friends. You can see its internal structure documented in dir(5) (depending on your OS); if you look at that, you'll see that the directory file entry contains almost no information about the file -- that's all in the file inode. One of the few things that's special about this file is that the open(2) function will given an error if you try to open a directory file with a mode which permits writing. Various other commands (to pick just one example, hexdump) will refuse to act in the normal way with directory files, just because that's probably not what you want to do (but that's their special case, not the filesystem's).
A hard link is nothing more nor less than an entry in a directory file's map. You can have two (or more) entries in such a map which both map to the same inode number: that inode therefore has two (or more) hard links. This also explains why every file has at least one 'hard link'. The inode has a reference count, which records how many times that inode is mentioned in a directory file somewhere in the filesystem (this is the number which you see when you do ls -l).
OK: we're getting to the point now.
The directory file is a map of strings ('filenames') to numbers (inode numbers). Those inode numbers are the numbers of the inodes of the files which are 'in' that directory. The files which are 'in' that directory might include other directory files, so their inode numbers will be amongst those listed in the directory. Thus, if you have a file /tmp/foo/bar, then the directory file foo includes an entry for bar, mapping that string to the inode for that file. There's also an entry in the directory file /tmp, for the directory file foo which is 'in' the directory /tmp.
When you create a directory with mkdir(2), that function
creates a directory file (with some inode number) with the correct internal structure,
adds an entry to the parent directory, mapping the new directory's name to this new inode (that accounts for one of the links),
adds an entry to the new directory, mapping the string '.' to the same inode (this accounts for the other link), and
adds another entry to the new directory, mapping the string '..' to the inode of the directory file it modified in step (2) (this accounts for the larger number of hard links you'll see on on directory files which contain subdirectories).
The end result is that (almost) the only special cases are:
The open(2) function tries to make it harder to shoot yourself in the foot, by preventing you opening directory files for writing.
The mkdir(2) function makes things nice and easy by adding a couple of extra entries ('.' and '..') to the new directory file, purely to make it convenient to move around the filesystem. I suspect that the filesystem would work perfectly well without '.' and '..', but would be a pain to use.
The directory file is one of the few types of files which are flagged as 'special' -- this is really what tells things like open(2) to behave slightly differently. See st_mode in stat(2).
I am screwed. I misused wildcards like a moron, in the rename command.
I repeated names twice in a 3gig folder, which I cannot afford to delete.
Now, the rename command is not working, and it says the file name is too long.
Please help me.
If programming can solve this, please let me know. I am a competent programmer in Java and PHP.
Under the hood, any rename command should get implemented with rename(). If you are in the directory where the file is and do:
mv hugefilenamethatiscreweduponandwanttobemuchshorted tersefile
it should work, as I don't think the path would get expanded out and overflow the limit. Otherwise, you can temporarily move the parent directory somewhere so it had a minimal path (like /p) and then rename the file and then move it back.