Unix directory structure: managing file name collision

Unix directory structure: managing file name collision - unix

Usually every time `make install' is run, files are not put in a specific directory like /usr/prog1. Instead, the files are put in directories where files from other programs are already in like /usr/lib and /usr/bin. I believe this has been a common practice since long time ago. This practice surely increases the probability of file name collision.
Since my googling returned no good discussion on this matter, I am wondering what people do to manage the file name collision? Do they simply try this or that name and if something goes wrong, a bug is filed by the user and the developer picks another name? Or, do they simply prefix the names of their files? Anyone is aware of a good discussion on this matter?

Usually people choose the name they want and if something collides then the problem gets resolved by the distribution. That's what happened with ack (ack in Debian, Kanji converter) and ack (ack-grep in Debian, text search utility).
Collisions don't seem to be that common though. A quick web search should tell you if the name is used somewhere. If it's not searcheable, it's probably not included in many distributions and that means you're not likely to actually conflict.

Usually when compiling programs, you can usually specify a prefix path like this: ./configure --prefix=/usr/local/prog1 or ./configure --prefix=/opt/prog1 (whether you use /usr/local or /opt doesn't really matter). Then when running make install it'll put the files in the specified prefix path. You can then either 1) add /opt/prog1/bin/ to your PATH or you can make a symlink to the executable file in /usr/local/bin which should already be in your PATH.
Best thing is to use your distributions package manager though.

Related

Streamlit: Disable the guard against running files without a .py extension?

I have a problem. Universally, my experience working in Unix systems has been that, by the time you are ready to place an executable "thing" in a bin folder for global access, you have decided to #! the file with the requisite interpreter:
#!/bin/awk
#!/bin/bash
#!/bin/perl
#!/bin/python3.8
#!/bin/whatever
And, although it is fine to have clutter at the local scope, when one places an executable in the bin folder, it should have:
A POSIX CLI interface
No discernible language tags or what have you
This is because it is now intended to be used for difficult work that requires forgetting about the details of this or that language: one now needs to think in terms of the functions as if the composable units are part of a consistent language, rather than a dozen different languages from a dozen different expert contributors.
This is the "genius" of the Unix/Linux/Posix architecture.
Anyways, when structuring my python projects, the end game is copying python executables to a global source on the path -- whether that "global" source is a pretend global source in my home directory (i.e., ~/.mytools/bin or the actual global path, /usr/bin or something like that -- and generally I want my python executables to have the same "game feel" as C executables, perl executables, BASH/ZSH/etc. executables.
In that vein, I knock off the extensions from my scripts and executables when they go in the bin. There is no need to know, from my usage perspective, what anything is made of when I go to use it.
However, streamlit requires me to re-append the .py to the file in the global path in order to run with streamlit run. This is a case of the library reaching up out of its useful value and holding me hostage, from my perspective, unless I violate best practices when extending the bin folder with python executables.
This means I have to create special logic to handle just streamlit, and that is really a kerfluffle. I have to either: change the way I handle all executables, or hardcode just the executable that will be run with streamlit. That means that, all of a sudden, I have an arbitrary name in my meta-control code for my project.
That is bad. Why? because I have to remember that I did it, and remember to change it if I change the executable name. I also have to remember to add to it if I add another streamlit executable.
Alternatively, I can copy all my exes made with python into the root bin folders with their .py extensions, which is not what I wanted to do.
How does one bypass this issue in streamlit?

If bin/sometool needs to be invoked with Streamlit via streamlit run bin/sometool, it seems like you're already exposing "meta-control code" to users of your bin script, right?
Instead, would this solve your problem?
bin/sometool:
#!/bin/bash
DIR=$(dirname "$0")
streamlit run "$DIR"/the_actual_script.py
(Where the_actual_script.py sits inside bin, but has chmod -x so that it's not directly executable.)

How can I permanently set an environment variable using Autotools?

I'm adapting an existing program to use Autotools for its build, but the resulting process depends on an environment variable. Is there a way to permanently set this environment variable during the build or installation process?
The program is intended to be used by Unix users and I could try to concatenate an export command directly to the .bashrc file and warn the user in case it fails because most of them will actually just use Ubuntu to run it (it's a relatively simple program that targets students), but I'd like to know if there's a more portable way to do this.
That's what I wouldn't like to do:
export VAR=/my/totally/not/hardcoded/path >> $HOME/.bashrc

Sorry to come to this late, but all of the answers to date are shockingly ... incomplete.
Building and installing software are both core use cases for the Autotools, and the installation part can absolutely involve adding or modifying files that affect user environments. If the software is installed by a user with sufficient privilege, then such effects can absolutely be applied to all system users, though the details may vary a bit from system to system (and the Autotools can help with that, too!).
For example, on RedHat-family Linuxes such as RedHat Enterprise, Fedora, Oracle Linux, and various others, you can drop an appropriately named file in /etc/profile.d, and the commands in it will automatically be read and executed by every login shell. Setting environment variables for all users is one of the common uses of this feature. I'm uncertain about Debian-family Linuxes such as Ubuntu, but it is always possible to modify file /etc/profile instead to have the same effect, and you absolutely can write an Automake install hook to do that.
Or for an altogether different approach, you can always provide a wrapper script around your program that sets the needed environment variables (supposing that the point is other than to add a directory to the PATH so as to find the program in the first place). In that case, you can even install the main program in a location that is not ordinarily in the path, so that users don't accidentally run it directly. This mechanism has the advantage that the environment variables are scoped to a run of the program, not a whole login session, but the disadvantage that users cannot override them.

I guess, no.
Autotools are about building your program, not about environment setup for the program to run. That's what users/admins are supposed to do. (Well I can imagine doing this, but I really don't want to try to figure it out, because the idea itself seems broken to me)
If your program REALLY needs some environment variable during run-time, then you should patch your sources for your application to test if the variable exists, and set one to default desired value, if it doesn't. Another idea is to enforce usage of an obligatory command line switch to pass the value in.

It's not clear what this has to do with autotools (or any other build system). No build system, by itself, can arrange for an env var to be present when the program it builds is run at a later tiem.
One solution is for your program to have a hardcoded default value for the var which is used if the environment var isn't present when the program starts running. Another frequently used solution is to name your binary something like myprog.bin and install a shell script named myprog which sets up the environment before doing exec myprog.bin.

I'm adapting an existing program to use Autotools for its build, but the resulting process depends on an environment variable. Is there a way to permanently set this environment variable during the build or installation process?
You've not been very concrete about what the program is (e.g. is the program a daemon? A user program?) or the nature of the environment variable dependency (e.g. is it another program? A mount point? A URL? A DB connection string?). Being more specific might give a better answer for you.
Anyway, autotools is not likely to offer any feature to help: It's a build system. Depending on the nature of your environment variable dependency, you're likely going to need package management (if you package it) or system administration level setup.
Since you think your primary user base is on Ubuntu this help page might give you some ideas.

How to install tools for everyone to use in unix

I'm a freshman, and I created a server with my roomates in order to practice in maintaining a server.
We installed CentOS7. And I would like to ask how I can install a tool for everyone to use?
More particularly, we want to install Cromwell. But since, they don't have instructions on how to install on Unix, I downloaded Linuxbrew and installed it like this.
The downside is that it's not visible to the other users connected to the servers.
I know this is a noob question, but any response would be appreciated.

A standard unix machine has programs (tools and so on) installed in predefined directories like /bin, /usr/bin, perhaps /usr/local/bin. Which to choose is another matter, probably you want /usr/bin. Also the environ variable PATH plays a role.
Into the chosen directory there should be a file representing the "tool". You can put a copy of the executable file in that directory, and set (or check) its permissions. Execution permission can be granted to all users, or only some, it depends. In other words,
/home/me/.linuxbrew/Cellar/cromwell
is not a good place for a "system" tool or app; you should copy that executable in /usr/bin, set ownership (perhaps to root?) with chown, and set the correct permissions with chmod.
You can make a hard link of your executable into the directory; this saves space, but also means that there is only one copy of the executable. Having two different copies (the "stable" one, and the other one you can fiddle with) can be handy.
After the executable is reachable and executable from the chosen users, maybe it needs some support files. To find them, it can rely on fixed locations, or some environment variable, or some configuration file. But all these things are outside of the scope of the question.

Try this command:
you#machine$ sudo chmod [who][op][permissions] filename
"who" refers to the users that have a particular permission: the user ("u"), the group ("g"), or other users ("o", also known as "world"). "op" determines whether to add ("+"), remove ("-") or explicitly set ("=") the particular permissions. "permissions" are whether the file should be readable ("r"), writable ("w"), or executable ("x"). As an example:
you#machine$ chmod o+x file
will add executable permission for others to file.

.goutputstream-XXXXX - possible to relocate?

I've been trying to create a union file system for a college project. One of its features that differentiates it from unionfs is the fact that there are no copy-ups. This means that if a file is located in a certain branch, it will remain there even if it is written to.
But my current problem with that is the fact that .goutputstream-XXXXX are created, renamed, and deleted whenever a write operation occurs. This is actually OK if the file being written to is in the highest priority branch (i.e. the default branch where files can be created), but makes my kernel crash if I try to write to a file in a lower branch.
How do I deal with this? How can I rig it so that all .goutputstream-XXXXX files are written to only one location? These .goutputstream-XXXXX files seem to be intricately connected to the files they correspond too, and seem to work only the same directory as the file being written to.
I also noticed that .goutputstream-XXXXX files appear when a directory is read. What are they for, anyway?

There has been a bug submitted to the ubuntu launchpad in which the creation of .goutputstream-xxxxx files is discussed.
https://bugs.launchpad.net/ubuntu/+source/lightdm/+bug/984785
From what i see now, these files are created when shutting down without preceding logout, but several other sources may occur, like evince or maybe gedit.
maybe lightdm has something to do with the creation of these files.
which distribution did you use?
maybe changing the distribution would help.

.goutputstream-XXXXX created by gedit and there is no simple way (menu or settings) to relocate them.

What makes the Unix file system more superior to the Windows file system?

I'll admit that I don't know the inner workings of the unix operating system, so I was hoping someone could shed some light on this topic.
Why is the Unix file system better than the windows file system?
Would grep work just as well on Windows, or is there something fundamentally different that makes it more powerful on a Unix box?
e.g. I have heard that in a Unix system, the number of files in a given directory will not slow file access, while on Windows direct file access will degrade as the # of files increase in the given folder, true?
Updates:
Brad, no such thing as the unix file system?

One of the fundamental differences in filesystem semantics between Unix and Windows is the idea of inodes.
On Windows, a file name is directly attached to the file data. This means that the OS prevents somebody from deleting a file that is currently open. On some versions of Windows you can rename a file that is currently open, and on some versions you can't.
On Unix, a file name is a pointer to an inode, which is the place the file data is actually stored. This has a couple of implications:
You can have two different filenames that refer to the same underlying file. This is often called a hard link. There is only one copy of the file data, so changes made through one filename will appear in the other.
You can delete (also known as unlink) a file that is currently open. All that happens is the directory entry is removed, but this doesn't affect any other process that might still have the file open. The process with the file open hangs on to the inode, rather than to the directory entry. When the process closes the file, the OS deletes the inode because there are no more directory entries pointing at it and no more processes with the inode open.
This difference is important, but it is unrelated to things like the performance of grep.

First, there is no such thing as "the Unix file system".
Second, upon what premise does your argument rest? Did you hear someone say it was superior? Perhaps if you offered some source, we could critique the specific argument.
Edit: Okay, according to http://en.wikipedia.org/wiki/Comparison_of_file_systems, NTFS has more green boxes than both UFS1 and UFS2. If green boxes are your measure of "better", then NTFS is "better".
Still a stupid question. :-p

I think you are a little bit confused. There is no 'Unix' and 'Windows' file systems. The *nix family of filesystems include ext3, ZFS, UFS etc. Windows primarily has had support for FAT16/32 and their own filesystem NTFS. However today linux systems can read and write to NTFS. More filesystems here
I can't tell you why one could be better than the other though.

I'm not at all familiar with the inner workings of the UNIX file systems, as in how the bits and bytes are stored, but really that part is interchangeable (ext3, reiserfs, etc).
When people say that UNIX file systems are better, they might mean to be saying, "Oh ext3 stores bits in such as way that corruption happens way less than NTFS", but they might also be talking about design choices made at the common layer above. They might be referring to how the path of the file does not necessarily correspond to any particular device. For example, if you move your program files to a second disk, you probably have to refer to them as "D:\Program Files", while in UNIX /usr/bin could be a hard drive, a network drive, a CD ROM, or RAM.
Another possibility is that people are using "file system" to mean the organization of paths. Like, for instance, how Windows generally likes programs in "C:\Program Files\CompanyName\AppName" while a particular UNIX distribution might put most of them in /usr/local/bin. In the later case, you can access much more of your system readily from the command line with a much smaller PATH variable.
Also, since you mentioned grep, if all the source code for system libraries such as the kernel and libc is stored in /usr/local/src, doing a recursive grep for a particular error message coming from the guts of some system library is much simpler than if things were laid out as /usr/local/library-name/[bin|src|doc|etc]. If you already have an inkling of where you're searching, though, cygwin grep performs quite well under Windows. In fact, I find for full-text searching I get better results from grep than the search facilities built into Windows!

well the *nix filesystems do a far better job of actual file managment than fat16/32 or NTFS. The *nix systems try to prevent the need for a defrag over windows doing...nothing? Other than that I don't really know what would make one better than the other.

There are differences in how Windows and Unix operating systems expose the disk drives to users and how drive space is partitioned.
The biggest difference between the two operating systems is that Unix essentially treats all of the physical drives as one logical drive. (This isn't exactly how it works, but should give a good enough picture.) This allows a much simpler file system from the users perspective as there are no drive letters to deal with. I have a folder called /usr/bin that could span multiple physical drives. If I need to expand that partition I can do so by adding a new drive, remapping the folder, and moving the files. (Again, somewhat simplified, but it gets the point across.)
The other difference is that when you format a drive, a certain amount is set aside (by default, as an admin you can change the size to 0 if you want) for use by the "root" account (admin account) which allows an admin to almost always be able to log in to the machine even when the user has filled the disk and is receiving "out of disk space" messages.

One simple answer:
Windows is a proprietary which means no one can see it's code except windows, while unix/linux are open-source. So as it is open-source many brighter minds have contributed towards the filesystem making it one of the robust and efficient, hence effective commands like grep come to our rescue when needed truly.

I don't know enough about the guts of the file systems to answer the first, except when I read the first descriptions of NTFS it sounded an awful lot like the Berkley Fast Filesystem.
As for the second, there are plenty of greps for Windows. When I had to use Windows in the past, I always installed Cygwin first thing.

The answer turns out to have very little to do with the filesystem and everything to do with the filesystem access drivers.
In particular, the implementation of NTFS on Windows is very slow compared to ext2/ext3. Also on Windows, "can't delete file in use" even though NTFS should be able to support it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex