What is the general syntax of a Unix shell command? - unix

In particular, why is that sometimes the options to some commands are preceded by a + sign and sometimes by a - sign?
for example:
sort -f
sort -nr
sort +4n
sort +3nr

These days, the POSIX standard using getopt() (aka getopt(3)) is widely used as a standard notation, but in the early days, people were experimenting. On some machines, the sort command no longer supports the + notation. However, various commands (notably ar and tar) accept controls without any prefix character - and dd (alluded to by Alok in a comment) uses another convention altogether.
The GNU convention of using '--' for long options (supported by getopt_long(3)) was changed from using '+'. Of course, the X11 software uses a single dash before multi-character options. So, the whole thing is a collection of historic relics as people experimented with how best to handle it.
POSIX documents the Utility Conventions that it works to, except where historical precedent is stronger.
What styles of option handling are there?
[At one time, SO 367309 contained the following material as my answer. It was originally asked 2008-12-15 02:02 by FerranB, but was subsequently closed and deleted.]
How many different types of options do you recognize? I can think of
many, including:
Single-letter options preceded by single dash, groupable when there is
no argument, argument can be attached to option letter or in next
argument (many, many Unix commands; most POSIX commands).
Single-letter options preceded by single dash, grouping not allowed,
arguments must be attached (RCS).
Single-letter options preceded by single dash, grouping not allowed,
arguments must be separate (pre-POSIX SCCS, IIRC).
Multi-letter options preceded by single dash, arguments may be
attached or in next argument (X11 programs; also Java and many programs on Mac OS X with a NeXTSTEP heritage).
Multi-letter options preceded by single dash, may be abbreviated
(Atria Clearcase).
Multi-letter options preceded by single plus (obsolete).
Multi-letter options preceded by double dash; arguments may follow '='
or be separate (GNU utilities).
Options without prefix/suffix, some names have abbreviations or are
implied, arguments must be separate. (AmigaOS
Shell)
For options taking an optional argument, sometimes the argument must be attached (co -p1.3 rcsfile.c),
sometimes it must follow an '=' sign. POSIX doesn't support optional
arguments meaningfully (the POSIX getopt() only allows them for the last
option on the command line).
All sensible option systems use an option consisting of double-dash
('--') alone to mean "end of options" — the following arguments are
"non-option arguments" (usually file names; POSIX calls them 'operands')
even if they start with a
dash. (I regard supporting this notation as an imperative. Be aware that if the -- is preceded by an option requiring an argument, the -- will be treated as the argument to the option, not as the 'end of options' marker.)
Many but not all programs accept single dash as a file name to mean
standard input (usually) or standard output (occasionally). Sometimes,
as with GNU 'tar', both can be used in a single command line:
... | tar -cf - -F - | ...
The first solo dash means 'write to stdout'; the second means 'read file
names from stdin'.
Some programs use other conventions — that is, options not preceded by a
dash. Many of these are from the oldest days of Unix. For example,
'tar' and 'ar' both accept options without a dash, so:
tar cvzf /tmp/somefile.tgz some/directory
The dd command uses opt=value exclusively:
dd if=/some/file of=/another/file bs=16k count=200
Some programs allow you to interleave options and other arguments
completely; the C compiler, make and the GNU utilities run without
POSIXLY_CORRECT in the environment are examples. Many programs expect
the options to precede the other arguments.
Note that git and other VCS commands often use a hybrid system:
git commit -m 'This is why it was committed'
There is a sub-command as one of the arguments. Often, there will be optional 'global' options that can be specified between the command and the sub-command. There are examples of this in POSIX; the sccs command is in this category; you can argue that some of the other commands that run other commands are also in this category: nice and xargs spring to mind from POSIX; sudo is a non-POSIX example, as are svn and cvs.
I don't have strong preferences between the different systems. When
there are few enough options, then single letters with mnemonic value
are convenient. GNU supports this, but recommends backing it up with
multi-letter options preceded by a double-dash.
There are some things I do object to. One of the worst is the same
option letter being used with different meanings depending on what other
option letters have preceded it. In my book, that's a no-no, but I know
of software where it is done.
Another objectionable behaviour is inconsistency in style of handling
arguments (especially for a single program, but also within a suite of
programs). Either require attached arguments or require detached
arguments (or allow either), but do not have some options requiring an
attached argument and others requiring a detached argument. And be
consistent about whether '=' may be used to separate the option and
the argument.
As with many, many (software-related) things — consistency is more
important than the individual decisions. Using tools that automate
and standardize the argument processing helps with consistency.
Whatever you do, please, read the TAOUP's Command-Line Options and
consider Standards for Command Line Interfaces. (Added by J F
Sebastian — thanks; I agree.)

It's completely arbitrary; the command may implement all of the option handling in its own special way or it might call out to some other convenience functions. The getopt() family of functions is pretty popular, so most software written even remotely recently follows the conventions set by those routines. There are always exceptions, of course!

It's left to apps to parse options hence the inconsistency. Expanding on your sort example these are all equivalent for coreutils:
sort -k3
sort --k 3
sort --key 3
sort --key=3
_POSIX2_VERSION=199209 sort +2

A shell command is just a program, and it is free to interpret its command line any way it likes.
Unix never had anything like Apple's interface police to make sure that the command-line interface was consistent across applications. As a result, there is inconsistency, especially in older commands.
Peering into my crystal ball, I think command-line tools will slowly migrate toward GNU standards, double dashes and all. (I grew up with single dashes and still find the double dash very awkward, but it is consistent.)

Related

clearmake doesn't like my MAKEFLAGS=j12 values

I use both GNU Make and - woe is me - ClearCase' clearmake.
Now, GNU make respect a flag named MAKEFLAGS, which for me is set to j20 on this multi-core machine I'm on. Unfortunately, clearmake also recognizes this option, yet doesn't except this value. It tells me:
clearmake: Error: Bad option (j)
clearmake: Error: Bad option (2)
clearmake: Error: Bad option (0)
Questions:
Why is this happening? Should ClearMake accommodate GNU Make's usage?
How can I get around it, other then turning the flag off an on repeatedly?
It's been 15 years or so since I used clearmake, but assuming it doesn't support the GNU make-specific GNUMAKEFLAGS variable you can use:
export GNUMAKEFLAGS=-j20
and leave MAKEFLAGS unset.
The "BUILDING SOFTWARE WITH CLEARCASE" clearly states in its T"unsupported Gnu make features" that this option is indeed not supported.
–j [JOBS]
--jobs=[JOBS]
Maybe a clearmake -C -J can help (for testing): there should then be no limit to the number of parallel builds.
Are you calling GNU make from a clearmake build script? Or are you trying to create a single makefile that will support both build tools? I think the GNUMAKEFLAGS EV is safer for GNU make specific values. I would also use
CCASE_MAKEFLAGS for any makeflags that are specific to clearmake.
CCASE_CONC to set the concurrency value. While clearmake no longer passes the -J in MAKEFLAGS, it used to, and if you're using an older clearmake (somewhere in the 7's as I recall), you could upset "child" GNU make sessions since they like -J about as much as clearmake likes -j.
Finally, check the env_ccase man page for the behavior mentioned in CCASE_MAKEFLAGS_V6_OBSOLETE. If you pass MAKEFLAGS explicitly in the build script like
$(MAKE) $(MAKEFLAGS) TARGET=x
And had started clearmake like this:
clearmake -C gnu TARGET=Y
You'll actually get both TARGET macro definitions in the command line. Setting the mentioned EV (at all) avoids the "pass defined macros in MAKEFLAGS" behavior. The switch exists because some people have makefiles that DEPEND on this behavior, while others have ones BROKEN BY this behavior...
Assuming for the sake of argument that your company has a support agreement with either IBM or HCL, this is a good time to use your support channels to bring up clearmake concerns.

Perl: Shebang (space?) "#! "?

I've seen both:
#!/path/...
#! /path/...
What's right? Does it matter? Is there a history?
I've heard that an ancient version of Unix required there not be a space. But then I heard that was just a rumor. Does anyone know for certain?
Edit: I couldn't think where better to ask this. It is programming related, since the space could make the program operate in a different way, for all I know. Thus I asked it here.
I also have a vague memory that whitespace was not allowed in some old Unix-like systems, but a bit of research doesn't support that.
According to this Wikipedia article, the #! syntax was introduced in Version 8 Unix in January, 1980. Dennis Ritchie's initial announcement of this feature says:
The system has been changed so that if a file being executed begins
with the magic characters #!, the rest of the line is understood to
be the name of an interpreter for the executed file. Previously (and
in fact still) the shell did much of this job; it automatically
executed itself on a text file with executable mode when the text
file's name was typed as a command. Putting the facility into the
system gives the following benefits.
[SNIP]
To take advantage of this wonderful opportunity, put
#! /bin/sh
at the left margin of the first line of your shell scripts. Blanks
after ! are OK. Use a complete pathname (no search is done). At the
moment the whole line is restricted to 16 characters but this limit
will be raised.
It's conceivable that some later Unix-like system supported the #! syntax but didn't allow blanks after the !, but given that the very first implementation explicitly allowed blanks, that seems unlikely.
leonbloy's answer provides some more context.
UPDATE :
The Perl interpreter itself recognizes a line starting with #!, even on systems where that's not recognized by the kernel. Run perldoc perlrun or see this web page for details.
The #! line is always examined for switches as the line is being
parsed. Thus, if you're on a machine that allows only one argument
with the #! line, or worse, doesn't even recognize the #! line, you
still can get consistent switch behaviour regardless of how Perl was
invoked, even if -x was used to find the beginning of the program.
Perl also permits whitespace after the #!.
(Personally, I prefer to write the #! line without whitespace, but it will work either way.)
And leonjoy's answer points to this web page by Sven Mascheck, which discusses the history of #! in depth. (I mention this now because of a recent discussion on comp.unix.shell.)
It seems to usually work both ways. See here. I'd say that the no-space version is much more common today, and, to me, much more appealing.
BTW, this is not specifically related to Perl (but it's definitely related to programming).

Potential Dangers of ALIASing a Unix Command Starting with "."?

I'd like to use alias to make some commands for myself when searching through directories for code files, but I'm a little nervous because they start with ".". Here's some examples:
$ alias .cpps="ls -a *.cpp"
$ alias .hs="ls -a *.h"
Should I be worried about encountering any difficulties? Has anyone else done this?
What is the advantage of putting the dot in the names? It seems like an unnecessary extra character. I'd just use the base names (hs and cpps) for the aliases.
I suppose that it might be argued that the dot indicates that the command is an alias - but why is that distinction beneficial? One of the great things about Unix was that it removed the distinction between hallowed commands provided by the O/S and programs written by the user. They are all equal - just located in different places.
I don't see any real dangers with using aliases that start with a dot. It would never have occurred to me to try; I'm mildly surprised that they are allowed. But given that they are allowed, there is no real risk involved that I can see.
I wouldn't use '.' to begin your aliases because it's next to '/' and you could hit the two together by mistake and accidentally run an executable in your current directory (especially if you use tab completion).
I doubt that there's any technical problem though it's likely to be confusing to anyone who has used Unix for a long time. In my world commands don't have dots in them and file names don't have spaces or upper case letters!

Why do <C-PageUp> and <C-PageDown> not work in vim?

I have Vim 7.2 installed on Windows. In GVim, the <C-PageUp> and <C-PageDown> work for navigation between tabs by default. However, it doesn't work for Vim.
I have even added the below lines in _vimrc, but it still does not work.
map <C-PageUp> :tabp<CR>
map <C-PageDown> :tabn<CR>
But, map and works.
map <C-left> :tabp<CR>
map <C-right> :tabn<CR>
Does anybody have a clue why?
The problem you describe is generally caused by vim's terminal settings not knowing the correct character sequence for a given key (on a console, all keystrokes are turned into a sequence of characters). It can also be caused by your console not sending a distinct character sequence for the key you're trying to press.
If it's the former problem, doing something like this can work around it:
:map <CTRL-V><CTRL-PAGEUP> :tabp<CR>
Where <CTRL-V> and <CTRL-PAGEUP> are literally those keys, not "less than, C, T, R, ... etc.".
If it's the latter problem then you need to either adjust the settings of your terminal program or get a different terminal program. (I'm not sure which of these options actually exist on Windows.)
This may seem obvious to many, but konsole users should be aware that some versions bind ctrl-pageup / ctrl-pagedown as secondary bindings to it's own tabbed window feature, (which may not be obvious if you don't use that feature).
Simply clearing them from the 'Configure Shortcuts' menu got them working in vim correctly for me. I guess other terminals may have similar features enabeld by default.
I'm adding this answer, taking details from vi & Vim, to integrate those that are already been given/accepted with some more details that sound very important to me.
The alredy proposed answers
It is true what the other answer says:
map <C-PageUp> :echo "hello"<CR> won't work because Vim doesn't know what escape sequence corresponds to the keycode <C-PageUp>;
one solution is to type the escape sequence explicitly: map ^[[5^ :echo "hello"<CR>, where the escape sequence ^[[5^ (which is in general different from terminal to terminal) can be obtained by Ctrl+VCtrl+PageUp.
One additional important detail
On the other hand the best solution for me is the following
set <F13>=^[[5^
map <F13> :echo "hello"<CR>
which makes use of one of additional function key codes (you can use up to <F37>). Likewise, you could have a bunch of set keycode=escapesequence all together in a single place in your .vimrc (or in another dedicated file that you source from your .vimrc, why not?).

console print w/o scrolling

I see console apps print colors and seen apps such as ffmpeg print text over itself instead of a new line. How do I print over an existing line? I want to display fps in my console app either at the very top or very bottom and have regular printfs go there and scroll normally.
I need this for windows, but this is meant to be cross platform, so I will eventually have a linux and mac implementation.
There is two simple possibilities which work on linux as well as windows, but only for one line:
printf("\b"); will return for one character, so you might count how many character you want to backspace and fire this in a loop, or you know that you only write n numbers and do it likeprintf("\b\b\b\b\b\b\b\b\b\b");
printf("text to be overwritten by next printf\r"); this will return the cursor to the beginning of the line, so any next printf will overwrite it. Make sure to write a string of same length or longer so you overwrite it entirely.
If you want to rewrite several lines, there is nothing so portable as ncurses, there is libs for it on practically every operating system, and you don't have to take care of the ANSI-differences.
edit: added link to ncurses wikipedia page, gives great overview and introduction, as well as link list and maybe a translation to your preferred language
Check out ncurses. It has bindings for most scripting languages.
You can use '\r' instead of '\n'.
The ASCII character number 8 (A.K.A. Ctrl-H, BS or Backspace) lets you back up one character. ASCII Character number 13 (A.K.A Ctrl-M, CR or Carriage Return) returns the cursor at the beggining of the line.
If you are working in C try putchar(8); and putchar(13);
The magic of the colors, cursor locating and bliking and so on are inside ANSI escape codes. Any text console capable of handling ANSI codes can use them just printing them out to console (i.e. by means of echo in a bash script or printf() function in C).
Unix terminals support ANSI escape sequences and Windows world used to support them back in old MS-DOS days, but the multibyte console support put an end to this. There is more information here. However there are other ways out of just ANSI sequences printing available on Windows. Moreover if you have Cygwin installed on your Windows maching ANSI codes work just as great as on any Unix terminal.
Many people mention Ncurses library that is the de-facto standard for any gui-like text based applications. What this library does is to hide all the terminal differences (Windows/Unix flavours) to represent the same information as identical as possible across all the platforms, though from my own experience I tell you this is not always true (i.e. typical text window frames change because the especial chars are not available under all character encodings). The counterpart of using ncurses is that it is a complete API and it is much harder to start out with it than simply writing out some ANSI escape sequences for simple things such as change the font color, cleaning screen or moving back the cursor to a random position.
For the sake of completeness I paste an example of use of ANSI sequence under Linux that changes the prompt to blue and shows the date:
PS1="\[\033[34m\][\$(date +%H%M)][\u#\h:\w]$ "
You can use Ncurses -
ncurses package is a subroutine library for terminal-independent screen-painting and input-event handling which presents a high level screen model to the programmer, hiding differences between terminal types and doing automatic optimization of output to change one screenfull of text into another
Depending on the platform which you are developing on there's probably a more powerful API which you could use, rather than old ASCII control codes.
e.g. If you are working on Win32 you can actually manipulate the console screen buffer directly.
A good place to start might be here
http://msdn.microsoft.com/en-us/library/ms683171(VS.85).aspx
I have been looking for similar functions/API which would allow me to access the console as something other than a stream of text for other platforms. Haven't found anything yet, but then again, I haven't been looking that hard.
Hope it helps.

Resources