How are soft link inode value's determined? - unix

In typical UNIX file systems, data and directory information are
both stored in structures called inodes. Figure 1 shows how a file
in directory/folder A named TheMatrix.jpg is stored starting at
inode 34119, where a set of file information and a list of locations
of its actual data are given. The first block containing actual data
is at inode 213456.
A second file called Link2Matrix.jpg in directory/folder B is linked to
the same file data. Indicate what would be the value of inode X
(shown inside the directory B block at the bottom left) when this
file is; (i) Hard linked to TheMatrix.jpg(ii) Symbolically (or soft linked) to TheMatrix.jpg
Block diagram of inode based file system
I understand that hard linking a file results in inode X taking on the same value as the linked i.e. 34119
However my understanding is that a soft-link will result in a new inode value being generated. However, I do not know how this value is assigned and it appears to be quite arbitrary. Is there a clue in the diagram somewhere that indicates what inode X would be if a soft link was created?
TIA!

Related

UNIX: Is the i-number same as the file descriptor?

Dennis Ritchie and Ken Thompson's paper UNIX Time-Sharing System mentions the following points
About i-number: A directory entry contains only a name for the associated file and a pointer to the file itself. This pointer is an integer called the i-number (for index number) of the file
About open and create system-calls: The returned value (of open and create) is called a file descriptor. It is a small integer used to identify the file in subsequent calls
Purpose of open/create: The purpose of an open or create system call is to turn the path name given by the user into an i-number by searching the explicitly or implicitly named directories
Does this mean that the file descriptor is just the i-number of a file? Or am I missing something?
A file descriptor in UNIX is basically just an index into the array of open files for the current process.
An inode number is an index into the inode table for the file system.
So they're basically just integers, indexes into an array, but they are indexes into completely different, unrelated arrays. So there is no connection between them.
To add to Chris Dodd's answer, not only are inode numbers and file descriptor numbers not directly related, it wouldn't be practical for them to be.
Inode numbers are unique to each file system. Imagine if you opened fileA on a file system (say, /mnt) with inode number 100, and in the same process also opened fileB on another filesystem (say, /mnt2) which also happened to have inode number 100. What should the file descriptors be in that case?

What is the "buffer" in the Atom editor?

In describing how find (& find and replace) work, the Atom Flight Manual refers to the buffer as one scope of search, with the entire project being another scope. What is the buffer? It seems like it would be the current file, but I expect it is more than that.
From the Atom Flight Manual:
A buffer is the text content of a file in Atom. It's basically the same as a file for most descriptions, but it's the version Atom has in memory. For instance, you can change the text of a buffer and it isn't written to its associated file until you save it.
Also came across this from The Craft of Text Editing by Craig Finseth, Chapter 6:
A buffer is the basic unit of text being edited. It can be any size, from zero characters to the largest item that can be manipulated on the computer system. This limit on size is usually set by such factors as address space, amount of real and/or virtual memory, and mass storage capacity. A buffer can exist by itself, or it can be associated with at most one file. When associated with a file, the buffer is a copy of the contents of the file at a specific time. A file, on the other hand, can be associated with any number of buffers, each one being a copy of that file's contents at the same or at different times.

Can the __LINKEDIT segment of a Mach-O executable be moved

In a Mach-O executable, I am trying to increase the size of the __LLVM segment that precedes the __LINKEDIT segment (with a home-grown tool). I am considering two strategies: (a) move the __LLVM segment to after the __LINKEDIT segment, producing a file that is not what ld would create (now with a gap and section addresses out of order), and (b) move the __LINKEDIT segment to allow resizing of the __LLVM segment that precedes it. I need the result to be accepted for downstream processing, e.g. generating an .ipa file or sending to the App Store.
This question is about my assumptions and the viability of these approaches. Specifically, what are the potential pitfalls of each that might lead them to fail?
I implemented the first approach (a) is understood by segedit's -extract option, but its -replace option complains that the segments are out of order. I append a new segment to the file and update the address and length values in the corresponding load command to refer to this new segment data (both in the file and the destination memory). This might be fine, as long as the other downstream processing will accept the result (still to check; e.g. any local signature is likely invalidated).
The second approach (b) would seem cleaner, as long as there are no references into the __LINKEDIT segment, which I guess contains linking information (symbol tables etc., rather than code). I have not tried this yet, though it seems to be a foregone conclusion that segedit will be happy with the result, which may suggest other processing might also be happier. Are there likely to be any references that are invalidated due to simply moving this segment? I am guessing that I will have to update further load commands (they seem to reference into the __LINKEDIT segment), which I have not examined, but this should be fairly straightforward.
EDIT: Replaced my confused use of "section" with "segment" (mentioned in answer).
ADDED: Context is where I have no control of generating the original executable. I need to post-process it, essentially performing a 'segedit -replace' process, wherein the a section in the segment is to be replaced with a section that is larger than space previously allocated for the segment.
RUN-ON clarifying question: It seems from the answer that moving the __LINKEDIT segment will break it. Can this be fixed by adjusting load commands only (e.g. LC_DYLD_INFO_ONLY, LC_LOAD_DYLINKER, LC_LOAD_DYLIB), not data in any segments? I am not yet familiar with these load commands, and would like to know whether to pursue this.
So basically the segments and sections describe how the physical file maps onto virtual memory.
As I mentioned in my previous iteration of the answer there are limitations on the segments order:
__TEXT section must start at executable physical file offset 0
__LINKEDIT section must not start at physical file offset 0
__LINKEDIT's File Offset + File Size should be equal to physical executable size (this implies __LINKEDIT being the last segment). Otherwise code signing won't work.
__DYLD_INFO_ONLY contains file offsets to dyld loading bind opcodes for:
rebase
bind at load
weak bind
lazy bind
export
For each kind there is file offset and size entry in __DYLD_INFO_ONLY describing the data in file that matches __LINKEDIT (in a "regular" ld linked executable). __DYLD_INFO_ONLY does not use any segment & section information from __LINKEDIT directly, the file offsets and sizes are enough.
EDIT also as mentioned in #kirelagin answer here
"Apparently, the new version of dyld from 10.12 Sierra performs a check that previous versions did not perform: it makes sure that the LC_SYMTAB symbols table is entirely within the __LINKEDIT segment."
I assume since you want to inflate the size of the preceding __LLVM segment you would also want some extra data in the file itself. Typically data described by __LINKEDIT (i.e. not the segment & sections themselves, but the actual data) won't use 100% of it's space so it could be modified to start "later" and occupy less space.
A tool called jtool by Jonathan Levin could probably do it for you.
I know this is an old question, but I solved this problem while solving another problem.
define the slide amount, this must be page-aligned, so I choose 0x4000.
add the slide amount to the relevant load commands, this includes but is not limited to:
__LINKEDIT segment (duh)
dyld_info_command
symtab_command
dysymtab_command
linkedit_data_commands
physically move the __LINKEDIT in the file.

data deletion with concurrent reading/writing in file system

Most file systems use locking to handle concurrent read/write. But what if after a read call, a write call is executed which deletes the data preceding the previous read call.
Is the pointer for a file open for reading updated to reflect the new start of the now smaller file?
The question isn't really valid, because you can't delete data using the write system call. You can overwrite data using the write(2) system call, but you can't delete data. Now, you can truncate the file using the truncate(2) system call. This changes the size of the file (reported via the st_size field by the stat(2) system call), and any bytes after the end of the file as reported by changed st_size will be zero. You can increase the size of the file using the truncate system call by requesting a new size which is larger than the current size. It is undefined (per the POSIX specification) whether this is allowed, or what the system will do when it recieves a truncate larger than the current size of the file. On many file systems it will simply set the size of the file to requested size.
OK, a few more concepts. Associated with each open file structure is a file offset pointer. Attempts to read or write a file using the read(2) or write(2) system call will advance the offset pointer by the number of bytes read or written. If you open a file twice using the open(2) system call, you will get two file descriptors, which each refer to a different open file structure, and in that case, a read(2) or write(2) using one file descriptor will not change the file offset for the other file descriptor. (If you clone a file descriptor using the dup(2) system call, then then you will get a second file descriptor which points to the same file structure, and then changes made to the file structure via one file descriptor, using the read(2), write(2), or lseek(2) system calls will be reflected via the cloned file descriptor. But that's a side issue, so that's all I will say on this topic for now.)
Now, if you truncate the file, this doesn't change the file offset in the file descriptor. However, any bytes after the truncated size will be zero if read. So the answer is that file offset pointer won't be updated after the truncate, but an attempt to read beyond the truncated size of the file will return all zeros.

What is the naming convention for Unix FIFOs/named pipes?

Various manual pages commonly exemplify FIFOs being opened in the /tmp directory, but they do not share a common naming convention. When I list the contents of my /tmp dir' I get nothing but directories named like /tmp/ssh-5oRuBPhI9lv9. Is there a convention, especially/specifically for IPC?
There is no official naming convention.
Sure, when using FIFOs, you will need some convention, since FIFOs are typically used for process communication between unrelated processes. So the name must be known to the different processes, which implies you have to follow some sort of convention, but it's your call.
The reason you see directories and files with mysterious names in /tmp is usually the result of the corresponding processes calling mkstemp(3) or mkdtemp(3). These functions atomically generate a unique name and create the corresponding file / directory.
If for some reason you want your FIFO to have a similar name, you can generate a unique name with tmpnam(3) and then pass that name to mkfifo(3). But note that there is a window of time between the call to tmpnam(3) and the call to mkfifo(3) where another process could create a file with the same name (and then mkfifo(3) would fail). If that's a problem, you could instead atomically create a temporary directory with mkdtemp(3) and then create the FIFO inside that directory with a name of your choice.
The reason there is no sure way to atomically generate and create a temporary, uniquely named FIFO is that FIFOs are used as rendezvous points for unrelated processes, so in general the name must be known a priori. Having a FIFO with a unique temporary name would make it harder for other processes to find it, which kind of defeats the purpose.

Resources