I'm trying to get familiar with a large project, possibly, initially written in Allegro Common Lisp. I have come across this piece of code:
(load "epilog:lib;compile.lisp")
Could please anyone explain what does it mean? Perhaps, if that helps, "epolig" is the name of a package "lib;compile.lisp" is a file "lib/compile.lisp", or so I could understand.
Is this a standard way to do something? And if so, what was the intention of this code? SBCL doesn't recognize colon as a special character in file name, i.e. it reports Couldn't load "epilog:lib;compile.lisp": file does not exist.
Logical Pathnames are a standard Common Lisp feature
It's not a symbol, it is a logical pathname.
Common Lisp has a portable logical pathname facility. The purpose is to abstract from physical pathnames like /usr/local/lisp/src/epilog/lib/compile.lisp or lispm:>sources>epilog>lib>compile.lisp.432 or any other type of pathname (just think of the differences between Unix, Mac OS X, Windows, ...).
The purpose is to use one single pathname scheme and one single logical file organization for your software. Regardless on what machine you are and where your files are, all you need is a mapping from the real file organization into the logical Lisp organization.
History
This facility came from a time when there were lots of different operating system and many different files (DEC VMS, IBM MVS, Multics, Unix, Lisp Machines, MS DOS, Macs, ...). The Lisp Machines were networked and could talk to all kinds of computers - so they learned the native file syntax for all those. In different laboratories (MIT, Xerox, SRI, ...) there were different machines on the network and different file servers. But the Lisp users wanted to load epilog:src;load.lisp and not remember where the stuff really is: on the local machine? but where? On a file server? But where? So on each network there was a registry for the translations from real file locations to logical pathnames.
So this is like an early 'URIs' facility for files - Uniform Resource Identifiers'.
The example explained
"epilog:lib;compile.lisp" is the name of a logical pathname.
epilog is the name of the logical host
lib; is the directory path
compile is the file name
lisp is the file type
Logical Pathname Translations
What you need is a translation between logical pathnames and physical pathnames:
Let's say we have a logical host EPILOG with just one translation rule. All files are on this machine for this Lisp under /usr/local/sources/epilog/. So we use some Unix conventions.
CL-USER 40 > (setf (logical-pathname-translations "EPILOG")
`(("**;*.*" "/usr/local/sources/epilog/**/*.*")))
(("**;*.*" "/usr/local/sources/epilog/**/*.*"))
Above only has one translation rule:
From EPILOG:**;*.* to /usr/local/sources/epilog/**/*.*.
It maps the logical hosts and all its subdirectories to a directory in a UNIX file system.
One could have more rules:
the documentation might be in a different place
there might be data files on a larger file system
compiled fasl files might be stored somewhere else
it might use logical subdirectories from other physical directories
But, again, here we use only one translation rule.
the example explained - part 2
Now we can parse a logical pathname:
CL-USER 41 > (pathname "epilog:lib;compile.lisp")
#P"EPILOG:LIB;COMPILE.LISP"
Let's describe it:
CL-USER 42 > (describe *)
#P"EPILOG:LIB;COMPILE.LISP" is a LOGICAL-PATHNAME
HOST "EPILOG"
DEVICE :UNSPECIFIC
DIRECTORY (:ABSOLUTE "LIB")
NAME "COMPILE"
TYPE "LISP"
VERSION NIL
As you see above, the parts have been parsed from our string.
Now we can also see how a logical pathname translates into a real pathname:
Translate a Logical Pathname to a physical pathname
CL-USER 43 > (translate-logical-pathname "epilog:code;ui;demo.lisp")
#P"/usr/local/sources/epilog/code/ui/demo.lisp"
So, now when you call (load "epilog:lib;compile.lisp"), then Lisp will translate the logical pathname and then really load the file from the translated physical pathname. What we also really want is that the Lisp for all purposes remembers the logical pathname - not the physical one. For example, when the file has a function named FOO, we want that Lisp records the location of the source of the function - but using the logical pathname. This way you can move a compiled file, a compiled application or a Lisp image to a different computer, update the translations and immediately it will be able to locate the source of FOO - if it is available on that machine or somewhere on a network accessible to that machine.
Logical Pathnames need to have a translation
To work with a logical pathname one needs to have a logical pathname translation like above. Often they are stored in a translations file by themselves. Define the translation, load it and then you can use corresponding logical pathnames to compile and load files. A typical software system using them, thus needs a corresponding translation. Sometimes it needs to be edited according to your file path, but sometimes they can be computed while loading the translations file. You'd to look where and how the logical host and the translations are defined.
History part 2
On a Symbolics Lisp Machine there is a site-wide directory, where systems and logical pathnames can be registered. Loading a system can then look up the system definition using this central directory and it also usually load a translations file. Thus the mechanism tells you what the structure of the system is (files, versions, patches, system versions, ...) and it tells you where it is located (which can be scattered around over several hosts or file systems).
Logical pathnames are not much used in newer software - you will encounter them sometimes in certain older software and especially those which were running on Lisp Machines - where this feature was extensively used throughout the system.
Related
I have a problem. Universally, my experience working in Unix systems has been that, by the time you are ready to place an executable "thing" in a bin folder for global access, you have decided to #! the file with the requisite interpreter:
#!/bin/awk
#!/bin/bash
#!/bin/perl
#!/bin/python3.8
#!/bin/whatever
And, although it is fine to have clutter at the local scope, when one places an executable in the bin folder, it should have:
A POSIX CLI interface
No discernible language tags or what have you
This is because it is now intended to be used for difficult work that requires forgetting about the details of this or that language: one now needs to think in terms of the functions as if the composable units are part of a consistent language, rather than a dozen different languages from a dozen different expert contributors.
This is the "genius" of the Unix/Linux/Posix architecture.
Anyways, when structuring my python projects, the end game is copying python executables to a global source on the path -- whether that "global" source is a pretend global source in my home directory (i.e., ~/.mytools/bin or the actual global path, /usr/bin or something like that -- and generally I want my python executables to have the same "game feel" as C executables, perl executables, BASH/ZSH/etc. executables.
In that vein, I knock off the extensions from my scripts and executables when they go in the bin. There is no need to know, from my usage perspective, what anything is made of when I go to use it.
However, streamlit requires me to re-append the .py to the file in the global path in order to run with streamlit run. This is a case of the library reaching up out of its useful value and holding me hostage, from my perspective, unless I violate best practices when extending the bin folder with python executables.
This means I have to create special logic to handle just streamlit, and that is really a kerfluffle. I have to either: change the way I handle all executables, or hardcode just the executable that will be run with streamlit. That means that, all of a sudden, I have an arbitrary name in my meta-control code for my project.
That is bad. Why? because I have to remember that I did it, and remember to change it if I change the executable name. I also have to remember to add to it if I add another streamlit executable.
Alternatively, I can copy all my exes made with python into the root bin folders with their .py extensions, which is not what I wanted to do.
How does one bypass this issue in streamlit?
If bin/sometool needs to be invoked with Streamlit via streamlit run bin/sometool, it seems like you're already exposing "meta-control code" to users of your bin script, right?
Instead, would this solve your problem?
bin/sometool:
#!/bin/bash
DIR=$(dirname "$0")
streamlit run "$DIR"/the_actual_script.py
(Where the_actual_script.py sits inside bin, but has chmod -x so that it's not directly executable.)
Let us assume we have a static file server (Nginx + Linux) that serves 10 files. The files are read almost as frequently as the server can process. However, some of the files need to be replaced with new versions, so that the filename and URL address remain unaltered. How to replace the files safely without a fear that some reads fail or become a mix of two versions?
I understand this is a rather basic operating system matter and has something to do with renames, symlinks, and file sizes. However, I failed to find a clear reference or a good discussion and I hope we can build one here.
Use rsync. Typically I choose rsync -av src dst, but YMMV.
What is terrific about rsync is that, in addition to having essentially zero cost when little or nothing changed, it uses atomic rename. So during file transfer, a ".fooNNNNN" temp file gets bigger and bigger. Once completed, rsync closes the file and renames it on top of "foo". So web clients either see all of the old, or all of the new file. Notice that range downloads (say from restart after error) are not atomic, exposing such clients to lossage, especially if bytes were inserted near beginning of file. SHA1 wouldn't validate for such a client, and he would have to restart his download from scratch. BTW, if these are "large" files, tell nginx to use zero-copy sendfile().
Whats the real different between these two commands? Why is the system call to delete a file called unlink instead of delete?
You need to understand a bit about the original Unix file system to understand this very important question.
Unlike other operating systems of its era (late 60s, early 70s) Unix did not store the file name together with the actual directory information (of where the file was stored on the disks.) Instead, Unix created a separate "Inode table" to contain the directory information, and identify the actual file, and then allowed separate text files to be directories of names and inodes. Originally, directory files were meant to be manipulated like all other files as straight text files, using the same tools (cat, cut, sed, etc.) that shell programmers are familiar with to this day.
One important consequence of this architectural decision was that a single file could have more than one name! Each occurrence of the inode in a particular directory file was essentially linking to the inode, and so it was known. To connect a file name to the file's inode (the "actual" file,) you "linked" it, and when you deleted the name from a directory you "unlinked" it.
Of course, unlinking a file name did not automatically mean that you were deleting / removing the file from the disk, because the file might still be known by other names in other directories. The Inode table also includes a link count to keep track of how many names an inode (a file) was known by; linking a name to a file adds one to the link count, and unlinking it removes one. When the link count drops down to zero, then the file is no longer referred to in any directory, presumed to be "unwanted," and only then can it be deleted.
For this reason the "deletion" of a file by name unlinks it - hence the name of the system call - and there is also the very important ln command to create an additional link to a file (really, the file's inode,) and let it be known another way.
Other, newer operating systems and their file systems have to emulate / respect this behavior in order to comply with the Posix standard.
Suppose that a request is made to ls somefile. How does the file system in UNIX handle this request from algorithmic perspective? Is that a O(1) query or O(log(N)) depending on files say starting in current directory node, or is it a O(N) linear search, or is that a combination depending on some parameters?
It can be O(n). Classic Unix file systems, based on the old school BSD fast file system and the like, store files as inode numbers, and their names are assigned at the directory level, not at the file level. This allows you have to the same file present in multiple locations at the same time, via hard links. As such, a "directory" in most Unix systems is just a file that lists filenames and inode numbers for all the files stored "in" that directory.
Searching for a particular filename in a directory just means opening that directory file and parsing through it until you find the filename's entry.
Of course, there's many different file systems available for Unix systems these days, and some will have completely differnet internal semantics for finding files, so there's no one "right" answer.
Its O(n) since the file systems has to read it off phyical media initially, but Buffer Caches will increase that significantly based on the Virtual File System (VFS) implementation on your flavor of *nix. (Notice how the first time you access a file its slower than the second time you execute the exact same command?)
To learn more read IBM's article on the Anatomy of the Unix file system.
Typical flow for a program like ls would be
Opendir on the current path.
Readdir for the current path.
Filter the entries returned by the OpenDir through filter provided on the command line. So typically O(n)
This is the generic flow, however there are many optimizations in place for special and frequent cases (;like caching of inode numbers of recent and frequent paths.
Also it depends on how directoy file are organized. In unix it is based on time of creation forcing to read every entry and increasing the look-up time to O(n). In NTFS equivalent of directory files are sorted based on name.
I can't answer your question. Maybe if you take a peak into the source code, you could answer your question yourself and explain us how it works.
ls.c
ls.h
I'll admit that I don't know the inner workings of the unix operating system, so I was hoping someone could shed some light on this topic.
Why is the Unix file system better than the windows file system?
Would grep work just as well on Windows, or is there something fundamentally different that makes it more powerful on a Unix box?
e.g. I have heard that in a Unix system, the number of files in a given directory will not slow file access, while on Windows direct file access will degrade as the # of files increase in the given folder, true?
Updates:
Brad, no such thing as the unix file system?
One of the fundamental differences in filesystem semantics between Unix and Windows is the idea of inodes.
On Windows, a file name is directly attached to the file data. This means that the OS prevents somebody from deleting a file that is currently open. On some versions of Windows you can rename a file that is currently open, and on some versions you can't.
On Unix, a file name is a pointer to an inode, which is the place the file data is actually stored. This has a couple of implications:
You can have two different filenames that refer to the same underlying file. This is often called a hard link. There is only one copy of the file data, so changes made through one filename will appear in the other.
You can delete (also known as unlink) a file that is currently open. All that happens is the directory entry is removed, but this doesn't affect any other process that might still have the file open. The process with the file open hangs on to the inode, rather than to the directory entry. When the process closes the file, the OS deletes the inode because there are no more directory entries pointing at it and no more processes with the inode open.
This difference is important, but it is unrelated to things like the performance of grep.
First, there is no such thing as "the Unix file system".
Second, upon what premise does your argument rest? Did you hear someone say it was superior? Perhaps if you offered some source, we could critique the specific argument.
Edit: Okay, according to http://en.wikipedia.org/wiki/Comparison_of_file_systems, NTFS has more green boxes than both UFS1 and UFS2. If green boxes are your measure of "better", then NTFS is "better".
Still a stupid question. :-p
I think you are a little bit confused. There is no 'Unix' and 'Windows' file systems. The *nix family of filesystems include ext3, ZFS, UFS etc. Windows primarily has had support for FAT16/32 and their own filesystem NTFS. However today linux systems can read and write to NTFS. More filesystems here
I can't tell you why one could be better than the other though.
I'm not at all familiar with the inner workings of the UNIX file systems, as in how the bits and bytes are stored, but really that part is interchangeable (ext3, reiserfs, etc).
When people say that UNIX file systems are better, they might mean to be saying, "Oh ext3 stores bits in such as way that corruption happens way less than NTFS", but they might also be talking about design choices made at the common layer above. They might be referring to how the path of the file does not necessarily correspond to any particular device. For example, if you move your program files to a second disk, you probably have to refer to them as "D:\Program Files", while in UNIX /usr/bin could be a hard drive, a network drive, a CD ROM, or RAM.
Another possibility is that people are using "file system" to mean the organization of paths. Like, for instance, how Windows generally likes programs in "C:\Program Files\CompanyName\AppName" while a particular UNIX distribution might put most of them in /usr/local/bin. In the later case, you can access much more of your system readily from the command line with a much smaller PATH variable.
Also, since you mentioned grep, if all the source code for system libraries such as the kernel and libc is stored in /usr/local/src, doing a recursive grep for a particular error message coming from the guts of some system library is much simpler than if things were laid out as /usr/local/library-name/[bin|src|doc|etc]. If you already have an inkling of where you're searching, though, cygwin grep performs quite well under Windows. In fact, I find for full-text searching I get better results from grep than the search facilities built into Windows!
well the *nix filesystems do a far better job of actual file managment than fat16/32 or NTFS. The *nix systems try to prevent the need for a defrag over windows doing...nothing? Other than that I don't really know what would make one better than the other.
There are differences in how Windows and Unix operating systems expose the disk drives to users and how drive space is partitioned.
The biggest difference between the two operating systems is that Unix essentially treats all of the physical drives as one logical drive. (This isn't exactly how it works, but should give a good enough picture.) This allows a much simpler file system from the users perspective as there are no drive letters to deal with. I have a folder called /usr/bin that could span multiple physical drives. If I need to expand that partition I can do so by adding a new drive, remapping the folder, and moving the files. (Again, somewhat simplified, but it gets the point across.)
The other difference is that when you format a drive, a certain amount is set aside (by default, as an admin you can change the size to 0 if you want) for use by the "root" account (admin account) which allows an admin to almost always be able to log in to the machine even when the user has filled the disk and is receiving "out of disk space" messages.
One simple answer:
Windows is a proprietary which means no one can see it's code except windows, while unix/linux are open-source. So as it is open-source many brighter minds have contributed towards the filesystem making it one of the robust and efficient, hence effective commands like grep come to our rescue when needed truly.
I don't know enough about the guts of the file systems to answer the first, except when I read the first descriptions of NTFS it sounded an awful lot like the Berkley Fast Filesystem.
As for the second, there are plenty of greps for Windows. When I had to use Windows in the past, I always installed Cygwin first thing.
The answer turns out to have very little to do with the filesystem and everything to do with the filesystem access drivers.
In particular, the implementation of NTFS on Windows is very slow compared to ext2/ext3. Also on Windows, "can't delete file in use" even though NTFS should be able to support it.