What are hardlinks on UNIX? How do they vary from copies? [closed] - unix

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I'm trying to get a basic understanding of the concept of "Hard and Soft Links" in Linux.
I came across several links online, and they all said that one must get an understanding of the Unix file-system before understanding what hard and soft links are.
I will first write what I got so far, mainly for you to correct me if I'm wrong...(I'm aware that I might over simplify things, this is for the sake of... well... simplifying things):
Every file (or folder) on a Unix file-system is identified not by its name, but by a number that represents a data-structure called an inode.
The inode contains information about the file it refers to, like permissions, size, location on the disk and more...
A directory (folder) in a Unix file-system, is actually no more than a list; a name-to-inode# mapping, of the files it contains. this means, that a file-name is separated from the file-content (the actual data on the disk).
So we could have something like this:
I also read, and this is where I get confused, that the name-inode map for a file isn't necessarily unique, meaning that a directory can contain two different file names that maps to the same inode number (and thus to the same file in the file system, and thus to the same actual content data on the disk), and two different directories may contain identical name-inode mappings (how is it different from "coping a folder with all its content"?) so using either pathname will lead to the same actual content on disk... How exactly can you name a file with two different names on linux, and what is it good for anyway?
btw, is there some linux tool that shows the information for a given inode?

how is it different from copying a folder with all its content?
Lets say you have an 8GB USB stick with an arbitrary folder which is 6GB in size. You want the folder to be accessible from 2 different locations e.g. /so_very_tired/shared_big_folder and /james/foobar but copying in both location wont fit of the USB stick. So you make a reference or symbolic link to the file/folder. The actual location for this folder can be anywhere, for example in /6GB_FOLDER
In order to do this so we create a symbolic link from /so_very_tired/shared_big_folder => /6GB_FOLDER and /james/foobar => /6GB_FOLDER
Now accessing the directory /so_very_tired/shared_big_folder and /james/foobar will lead to /6GB_FOLDER.
To find more information of files and indoes, open up a command prompt and type
$man ls
You can use ls -i to list the inode numbers for files
They are mostly used when you want to place a link to a file where the file doesn't exist e.g You have a configuration file for a server placed in /etc/nginx/sites-avaliable/ and in order to enable it you want to place it in the /etc/site-enabled folder but you don't want to make a copy. By creating a symbolic link all edits to the file in the sites-available folder are also made to the file pointed to by the symbolic link in sites-enabled. This is because one is the file and the other points the the same file.
To create a symbolic link use
ln -s /existing-file /symbolic-link

Related

Linking in UNIX

I have a few questions regarding links in UNIX
Can I say Soft links in UNIX are analogous to shortcuts in windows?
Difference between copying and hard-linking?
Can anyone give me a use-case where I should prefer hard-linking over copying?
I'm so messed up right now. Any help is highly appreciated
I don't known much about shortcuts in windows, but I think it's similar but not the same. A soft link on file system level is basically a textfile with a pathname and a flag marking it as link. It can be any relative or absolute pathname anywhere on the machine. Normally any user process who opens that link is redirected by the kernel to the file it points to and doesn't even 'realize' it. Reading the link as link itself requires special system calls like readlink()
When you remove the file a soft link points to, the link remains but now points to 'nowhere' and can't be read anymore.
You can imagine a hard link as a second directory entry that's pointing to the same area on the file system as the 'original file' (more exactly: it points to the same inode that represents the location of the file and meta information like size, owner etc.). Having made a hard link 'original' and 'link' are indisdinguishable and if you change the file via one of the pathnames, you will see the changes via the other pathname as well. That doesn't apply for removing, as long as the link count (another value stored in the inode) is greater then 1 only the directory entry is removed and the link count is decremented.
Hard links can only be made within the same file system because every file system has it's own table of inodes.
That follows more or less from 2. If you want to use the special properties of hard links (or just save space in case of huge files) use a hard link otherwise do a copy

Ada `Gprbuild` Shorter File Names, Organized into Directories

Over the past few weeks I have been getting into Ada, for various different reasons. But there is no doubt that information regarding my personal reasons as to why I'm using Ada is out of scope for this question.
As of the other day I started using the gprbuild command that comes with the Windows version of GNAT, in order to get the benefits of a system for managing my applications in a project-related manner. That is, being able to define certain attributes on a per-project basis, rather than manually setting up the compile-phase myself.
Currently when naming my files, their names are based off of what seems to be a standard for the grpbuild, although I could very much be wrong. For periods (in the package structure), a - is put in the name of the file, for underscores, an _ is put accordingly. As such, a package by the name App.Test.File_Utils would have a file name of app-test-file_utils: .ads and .adb accordingly.
In the .gpr project file I have specified:
for Source_Dirs use ("app/src/**");
so that I am allowed to use multiple directories for storing my files, rather than needing to have them all in the same directory.
The Problem
The problem that arises, however, is that file names tend to get very long. As I am already putting the files in a directory based on the package name contained by the file, I was wondering if there is a way to somehow make the compiler understand that the package name can be retrieved from the file's directory name.
That is, rather than having to name the App.Test.File_Utils' file name app-test-file_utils, I would like it to reside under the app/test directory by the name file_utils.
Is this doable, or will I be stuck with the horrors of eventually having to name my files along the lines of: app-test-some-then-one-has-more_files-another_package-knew-test-more-important_package.ads? Granted, I have not missed something about how an Ada application should actually be structured.
What I have tried
I tried looking for answers in the package Naming configuration of the gpr files in the documentation, but to no avail. Furthermore I have been browsing the web for information, but decided it might be better to get help through Stackoverflow, so that other people who might struggle with this problem in the future (granted it is a problem in the first place) might also get help.
Any pointers in the right direction would be very helpful!
In the top-secret GNAT documentation there is a description of how to use non-default file names. It's a great deal of effort. You will probably give up, use the default names, and put them all in a single directory.
You can also simplify much of the effort by using GPS and letting it build your project file as you add files to your source directories.

Is there an always existing, known Unix path string guaranteed to name an always empty directory?

Under many, most, or maybe all Unix file systems, if you iterate over the links in a directory, there will usually/always be at least two, one pointing to the current directory ("./"), and one back-pointing to the parent directory ("../"). Except maybe for the root, which would have only the first of these two links.
But it might be that this is not true under some other file systems that purport to comport with most Unix conventions (but don't quite).
Is there a directory somewhere in a Unix file system guaranteed to always be an empty directory and whose link count can always be read using, e.g., stat() or equivalent?
If so, one can check the link count and expect it to be 2. Or perhaps something else, which would allow a program to adjust its behavior accordingly.
There is no standard directory which is always empty -- but you could create one, if you needed to. One easy way to do this would be using the mkdtemp() function.
However, there is no guarantee that all directories will be using the same file system. For instance, if a FAT filesystem is mounted, directories corresponding to that file system may behave differently from other ones.

Should I store website images in SQL Server or on the C: drive [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am programming a website in asp.net Visual Web Developer which I am going to have a lot of product pictures to display on the webpage. Should I store all my images in SQL Server and pull each picture from there or should I store all of the images in a "Picture" folder created inside of my website root folder? Is there a big difference? The Images would be linked to other tables in the database by using the Order_Number this is not a problem.
Too long for a comment.
Images in the database -- I know too many people that regret that decision. Just don't do it except perhaps in light duty usage.
Don't store the path of the image in the database. If you ever have to split images into multiple locations you will have a big mess. Ideally you store a unique (string) identifier hash. Then you computer via a shared function to correct location to pull this from based on the hashed name.
For version 1.0 you could just dump everything into a single directory (so your hash to directory function is very simple). Ideally you want the generated name to be "randomly distributed", i.e., as likely to be zq% as an%. You also ideally want it to be short. Unique is a requirement. For example, you could use an identity field - guaranteed unique but not randomly distributed. If you have large numbers of images, you will want to store these in multiple directory -- so you don't essentially lock up your machine if you ever look at this directory with windows explorer.
A good practice is to combine methods. e.g. Make a hashing function that yield 4 characters (perhaps by keeping only 4 characters of output from TSQL HASHBYTES or CHECKSUM (hashing the identity value) and making the short hash the directory name. Now use the identity value as the filename and you have a simple and scaleable design since you can tweak the algorithm down the road if needed.
Store them on the hard drive, this will allow IIS to cache them and serve them much more efficiently. If you make it so that requesting an image requires invoking a controller IIS cannot cache the image as a static file.

Why is '.' a hard link in Unix? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've seen many explanations for why the link count for an empty directory in Unix based OSes is 2 instead of 1. They all say that it's because of the '.' directory, which every directory has pointing back to itself. I understand why having some concept of '.' is useful for specifying relative paths, but what is gained by implementing it at the filesystem level? Why not just have shells or the system calls that take paths know how to interpret it?
That '..' is a real link makes much more sense to me -- the filesystem needs to store a pointer back to the parent directory in order to navigate to it. But I don't see why '.' being a real link is necessary. It also seems like it leads to an ugly special case in the implementation -- you would think you could only free the space used by inodes that have a link count less than 1, but if they're directories, you actually need to check for a link count less than 2. Why the inconsistency?
Why not just have shells or the system calls that take paths know how
to interpret it?
For transparency. If the filesystem does it the applications (and the myriad of system calls) don't have to do anything special with "." like "Oh, the user wants the current directory!". The notion of cwd and whatever it means is stored neatly out of the way at the FS level.
It also seems like it leads to an ugly special case in the
implementation -- you would think you could only free the space used
by inodes that have a link count less than 1, but if they're
directories, you actually need to check for a link count less than 2.
It's not a special case. All files in Unix have a number of links. Any file you unlink is checked "Is this the last link?". If it is, it gets the chop. If not, it lingers around.
(Hmm: the following is now a bit of an epic...)
The design of the directory on unix filesystems (which, to be pedantic, are typically but not necessarily attached to unix OSs) represents a wonderful insight, which actually reduces the number of special cases required.
A 'directory' is really just a file in the filesystem. All the actual content of files in the filesystem is in inodes (from your question, I can see that you're already aware of some of this stuff). There's no structure to the inodes on the disk -- they're just a big bunch of numbered blobs of bytes, spread like peanut-butter over the disk. This is not useful, and indeed is repellent to anyone with a shred of tidy-mindedness.
The only special inode is inode number 2 (not 0 or 1, for reasons of Tradition); inode 2 is a directory file: the root directory. When the system mounts the filesystem, it 'knows' it has to readdir inode 2, to get itself started.
A directory file is just a file, with an internal structure which is intended to be read by opendir(3) and friends. You can see its internal structure documented in dir(5) (depending on your OS); if you look at that, you'll see that the directory file entry contains almost no information about the file -- that's all in the file inode. One of the few things that's special about this file is that the open(2) function will given an error if you try to open a directory file with a mode which permits writing. Various other commands (to pick just one example, hexdump) will refuse to act in the normal way with directory files, just because that's probably not what you want to do (but that's their special case, not the filesystem's).
A hard link is nothing more nor less than an entry in a directory file's map. You can have two (or more) entries in such a map which both map to the same inode number: that inode therefore has two (or more) hard links. This also explains why every file has at least one 'hard link'. The inode has a reference count, which records how many times that inode is mentioned in a directory file somewhere in the filesystem (this is the number which you see when you do ls -l).
OK: we're getting to the point now.
The directory file is a map of strings ('filenames') to numbers (inode numbers). Those inode numbers are the numbers of the inodes of the files which are 'in' that directory. The files which are 'in' that directory might include other directory files, so their inode numbers will be amongst those listed in the directory. Thus, if you have a file /tmp/foo/bar, then the directory file foo includes an entry for bar, mapping that string to the inode for that file. There's also an entry in the directory file /tmp, for the directory file foo which is 'in' the directory /tmp.
When you create a directory with mkdir(2), that function
creates a directory file (with some inode number) with the correct internal structure,
adds an entry to the parent directory, mapping the new directory's name to this new inode (that accounts for one of the links),
adds an entry to the new directory, mapping the string '.' to the same inode (this accounts for the other link), and
adds another entry to the new directory, mapping the string '..' to the inode of the directory file it modified in step (2) (this accounts for the larger number of hard links you'll see on on directory files which contain subdirectories).
The end result is that (almost) the only special cases are:
The open(2) function tries to make it harder to shoot yourself in the foot, by preventing you opening directory files for writing.
The mkdir(2) function makes things nice and easy by adding a couple of extra entries ('.' and '..') to the new directory file, purely to make it convenient to move around the filesystem. I suspect that the filesystem would work perfectly well without '.' and '..', but would be a pain to use.
The directory file is one of the few types of files which are flagged as 'special' -- this is really what tells things like open(2) to behave slightly differently. See st_mode in stat(2).

Resources