how does unix handle full path name with space and arguments? - unix

How does unix handle full path name with space and arguments ?
In windows we quote the path and add the command-line arguments after, how is it in unix?
"c:\foo folder with space\foo.exe" -help
update:
I meant how do I recognize a path from the command line arguments.

You can either quote it like your Windows example above, or escape the spaces with backslashes:
"/foo folder with space/foo" --help
/foo\ folder\ with\ space/foo --help

You can quote if you like, or you can escape the spaces with a preceding \, but most UNIX paths (Mac OS X aside) don't have spaces in them.
/Applications/Image\ Capture.app/Contents/MacOS/Image\ Capture
"/Applications/Image Capture.app/Contents/MacOS/Image Capture"
/Applications/"Image Capture.app"/Contents/MacOS/"Image Capture"
All refer to the same executable under Mac OS X.
I'm not sure what you mean about recognizing a path - if any of the above paths are passed as a parameter to a program the shell will put the entire string in one variable - you don't have to parse multiple arguments to get the entire path.

Since spaces are used to separate command line arguments, they have to be escaped from the shell. This can be done with either a backslash () or quotes:
"/path/with/spaces in it/to/a/file"
somecommand -spaced\ option
somecommand "-spaced option"
somecommand '-spaced option'
This is assuming you're running from a shell. If you're writing code, you can usually pass the arguments directly, avoiding the problem:
Example in perl. Instead of doing:
print("code sample");system("somecommand -spaced option");
you can do
print("code sample");system("somecommand", "-spaced option");
Since when you pass the system() call a list, it doesn't break arguments on spaces like it does with a single argument call.

Also be careful with double-quotes -- on the Unix shell this expands variables. Some are obvious (like $foo and \t) but some are not (like !foo).
For safety, use single-quotes!

You can quote the entire path as in windows or you can escape the spaces like in:
/foo\ folder\ with\ space/foo.sh -help
Both ways will work!

I would also like to point out that in case you are using command line arguments as part of a shell script (.sh file), then within the script, you would need to enclose the argument in quotes. So if your command looks like
>scriptName.sh arg1 arg2
And arg1 is your path that has spaces, then within the shell script, you would need to refer to it as "$arg1" instead of $arg1
Here are the details

If the normal ways don't work, trying substituting spaces with %20.
This worked for me when dealing with SSH and other domain-style commands like auto_smb.

Related

how to echo literal variable value with zsh

I have a simple shell function to convert a *nix style path to Windows style (I happen to be using Windows Subsystem for Linux).
# convert "/mnt/c/Users/josh" to "C:\Users\josh"
function winpath(){
enteredPath=$1
newPath="${enteredPath/\/mnt\/c/C:}" # replace /mount/c/ with C:
newPath="${newPath//\//\\}" # replace / with \
echo $newPath
}
The desired behavior is:
$ winpath /mnt/c/Users/josh
C:\Users\josh
This works correctly in bash, but in zsh, echo seems to do some extra interpolation of the $newPath value. It behaves like this:
$ winpath /mnt/c/Users/josh
C:sers\josh
What character sequence is echo interpolating and why is it remove the \U? Most importantly, how do I return the literal value?
I've tried digging through the zsh documentation, but it's a jungle. Thanks in advance!
zsh processes certain escape sequences that bash does not by default. \U introduces 4-byte Unicode codepoint, but since the following 8 characters are not a valid hexadecimal number, no character is substituted.
I would recommend using printf, as its behavior is much more predictable from shell to shell.
printf '%s\n' "$newPath"
The problem is that you are using the internal command echo, instead of the external one. If you would write
command echo $newPath
you would get the expected output. command forces zsh to look up the command word according to the current PATH, ignoring internal commands, aliases or functions of the same name.

Rscript behaves inconsistently on windows with single and double quotes

If I invoke
Rscript -e "print('hello')"
It correctly prints out the answer
[1] "hello"
However, if I switch single and double quotes, it does not work, and it looks like the double quotes are removed:
Rscript -e 'print("hello")'
gives:
Error in print(hello) : object 'hello' not found
Execution halted
Note that it's not powershell doing the escaping incorrectly. Echoing only gives the expected results:
PS> echo 'print("hello")'
print("hello")
PS> echo "print('hello')"
print('hello')
And the same behavior is not observed on macOs or Linux, where both variants are correctly parsed.
Interestingly enough, it's even crazier for command.com:
C:>Rscript -e "print('hello')"
[1] "hello"
C:>Rscript -e 'print("hello")'
[1] "print(hello)"
I mean... what?!?
This has already been mentioned here:
Single line code to run R code from Windows command line
but there's no explanation about it. In my opinion it's a bug of Rscript on windows, but I want to hear other opinions.
Dabombber's helpful answer provides all the pointers, but let me try to boil it down conceptually:
The problem is not specific to RScript.exe and potentially affects calls to any external executable from PowerShell:
Up to at least PowerShell 7.1 (current as of this writing), passing arguments with embedded double quotes (") to external programs is fundamentally broken, as detailed in GitHub issue #1995; in short: behind the scenes, PowerShell constructs a command line for the target program (process) that uses "..."-quoting only, but neglects to escape embedded verbatim " chars. for their syntactically valid inclusion in such double-quoted strings; a fix may be coming in v7.2 - see this answer.
For now, you have to manually escape embedded " chars. as \".
However, if and when the bug gets fixed, this workaround will break, because the fix requires that this escaping be applied automatically, which would then escape a verbatim \" as \\\".
# WORKAROUND as of v7.0, which will break if and when the problem gets fixed.
PS> Rscript -e 'print(\"hello\")'
The third-party Native module (install with Install-Module -Scope CurrentUser Native, for instance) offers helper function ie, which compensates for the broken behavior; it is written in a forward-compatible manner so that it will simply defer to the built-in behavior if and when it should get fixed:
# Thanks to `ie`, no workarounds are required.
PS> ie Rscript -e 'print("hello")'
As for ad hoc workarounds - both of them work for Rscript.exe, but can't be expected to be a general solution:
For target programs that support both '...' and "..." quoting: Swap the quotes to use only embedded ' chars., as shown in your question, but note that '...' and "..." strings have different semantics in PowerShell ("..." strings are expandable (interpolating) strings), and may have different semantics in the target program too (not the case in Rscript):
Rscript -e "print('hello')"
For target programs that accept input via stdin, use the PowerShell pipeline, where the bug doesn't surface (though note that you may have to set the $OutputEncoding preference variable to the character encoding expected by the target program):
'print("hello")' | Rscript -
As for your observations and background information, including about cmd.exe and POSIX-compatible shells:
Note that it's not powershell doing the escaping incorrectly.
As Dabombber points out, it is PowerShell that is the problem, but the problem only occurs when calling external programs, whereas echo is a built-in alias for the PowerShell-native
Write-Output cmdlet (verify with Get-Command echo).
On Windows, you could see the problem with the flawed parameter passing as follows, by invoking choice.exe (ignore the [Y,N]?N suffix):
PS> 'n' | choice /m 'print("hello")'
print(hello) [Y,N]?N
choice.exe with /m can be used to echo an argument as it would be received by external programs, and as you can see the double quotes were effectively lost, because PowerShell mistakenly placed print("hello") verbatim on the process command line - without escaping the " chars. - which external programs parse as verbatim print(hello), because they allow a single argument to be composed of unquoted and double-quoted parts (print( + hello (stripped of the syntactic double quotes) + )).
If verbatim print(hello) is interpreted as an R script, it looks for a variable (object) named hello - which in this scenario doesn't exist and triggers the error message you saw.
On Unix-like platforms (macOS, Linux), using the cross-platform PowerShell [Core] edition, /bin/echo 'print("hello")' shows the same problem.
And the same behavior is not observed on macOs or Linux, where both variants are correctly parsed.
Yes, if you use a native, POSIX-compatible shell there, such as bash, you'll get the correct behavior (see below).
it's even crazier for command.com:
As an aside: You probably meant cmd.exe, the legacy command processor (Command Prompt) on NT-based Windows platforms up to the current Windows 10.
(command.com was the command processor on the extinct DOS-based Windows versions that ended with Windows ME).
cmd.exe only recognizes double-quoting ("...") to demarcate argument boundaries for itself, not also single-quoting ('...'); irrespective of that, it essentially passes the original quoting through to the target executable (after performing its own interpretation of the command line, such as environment-variable expansion).
This differs fundamentally from what PowerShell and POSIX-compatible shells do:
On Unix-like platforms - where POSIX-compatible shells recognize '...'-quoted arguments - the concept of a process command line doesn't exist, and whatever arguments a POSIX-like shell has itself parsed out of its command line are passed as-is - as an array of verbatim arguments - to the target executable; thus shell string literals "print('hello')" and 'print("hello")' are passed as verbatim print('hello') and print("hello"), respectively, which works as expected, given that R too recognizes both '...' and "..." string literals.
PowerShell too has '...' strings (it treats their content verbatim), but on Windows it translates them to "..." strings behind the scenes (if quoting is needed), which is where the aforementioned bug can surface as of v7.0. The bug aside, this translation makes sense, because only "..." quoting can be assumed to have syntactic meaning on the command line for other programs (see bottom section). Unfortunately, PowerShell does the same thing on Unix-like platforms, even though it shouldn't (it constructs a pseudo command line that the .NET API then parses into an array of verbatim arguments passed to the target process), so the bug surfaces there as well.
Because cmd.exe preserves the original quoting, RScript interprets 'print("hello")' in command line Rscript -e 'print("hello")' as a string literal rather than as a command, because it removes any " chars. with syntactic function on the command line first (whereas ' (single quotes) by convention do not have syntactic meaning on the command line), before the result is interpreted as an R script:
'print("hello")' is therefore parsed as 'print( + hello (the command-line " are stripped) + ), resulting in verbatim 'print(hello)' getting interpreted as R code, which is an R string literal that therefore prints as-is (the output uses "..." quoting, but that's just an artifact of output formatting; note that an explicit call to print() isn't necessary, the result of an expression - such as string literal 'print(hello)' in this case - is automatically printed).
By contrast, "print('hello')" is parsed as verbatim print('hello') (the command-line " are stripped), which - due to the absence of enclosing quoting - is then interpreted as a command, namely a print() function call, as intended.
Ultimately, there are no hard and fast rules in the anarchic world of process command-line parsing on Windows: it is ultimately up to each program to interpret its command line - this answer contains excellent background information.
Fortunately, however, there are widely adhered-to conventions, as implemented in the MS C/C++/.NET compilers and documented here.
Unfortunately, as of PowerShell 7.0, PowerShell doesn't adhere to these conventions, due to the aforementioned bug. Since the bug has been around since v1, users have learned to work around it, such as with manual \"-escaping, as shown above. The problem is that fixing the bug would break all workarounds. Implementing a fix as an experimental feature is now being considered, for v7.1 at the earliest - see this PR on GitHub and the associated discussion here, which suggests that, in addition to implementing the widely established conventions, accommodations be made for calls to batch files and msiexec.exe-style programs, which have non-conventional quoting requirements.
It might be worth taking a look through this PowerShell issue: Arguments for external executables aren't correctly escaped. The Native module by Michael Klement provides a workaround until the problem is fixed (and shouldn't be broken post-fix like many current workarounds).
Note that it's not powershell doing the escaping incorrectly. Echoing only gives the expected results
echo is a PowerShell function rather than an external program so you won't notice the broken behaviour when using it.
PS> Get-Command echo
CommandType Name Version Source
----------- ---- ------- ------
Alias echo -> Write-Output
A better test would be to use the EchoArgs.exe command line tool from PowerShell
Community Extensions (downloadable here).
PS> echoargs.exe 'print("hello")'
Arg 0 is <print(hello)>
Command line:
"E:\echoargs.exe" print("hello")
PS> echoargs.exe "print('hello')"
Arg 0 is <print('hello')>
Command line:
"E:\echoargs.exe" print('hello')
Note that it's not powershell doing the escaping incorrectly. Echoing
only gives the expected results:
In the case of using echo, its echo which is directly consuming the argument you are passing to it, so you get the same result for single quotes or double quotes.
In the case of Rscript, I believe Rscript is just a convenient way of calling R with some additional arguments. (see https://swcarpentry.github.io/r-novice-inflammation/05-cmdline/ for explanation). Specifically, it says that "From this output, we learn that Rscript is just a convenience command for running R scripts...."
So maybe what's happening is that when you call RScript, its passing the argument to a separate process, and because of this its trying to expand hello as a variable, leading to the error (in the Powershell case)
As for cmd it has its own behavior for handling single and double quotes.
See: What does single-quoting do in Windows batch files?
and
Differences between single and double quotes in CMD
So the problem may not be with RScript. The resulting output of your use case may just be a side effect of how powershell and cmd handle double quotes and single quotes.
This may also explain why the problem is there only on windows, and not in Linux or MacOS.
Check out this one! https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_quoting_rules?view=powershell-7
expressions in single-quoted strings are not evaluated. They are interpreted as literals.
in a double-quoted string, expressions are evaluated, and the result is inserted in the string.
same rules apply for cmd

Handling "?" character passed to ZSH function

I'm having problem with setting up simple function in ZSH.
I want to make function which downloads only mp3 file from youtube.
I used youtube-dl and i want to make simple function to make that easy for me
ytmp3(){
youtube-dl -x --audio-format mp3 "$#"}
So when i try
ytmp3 https://www.youtube.com/watch?v=_DiEbmg3lU8
i get
zsh: no matches found: https://www.youtube.com/watch?v=_DiEbmg3lU8
but if i try
ytmp3 "https://www.youtube.com/watch?v=_DiEbmg3lU8"
it works.
I figured out that program runs (but wont download anything) if i remove all charachers after ? including it. So i guess that this is some sort of special character for zsh.
By default, the ZSH will try to "glob" patterns that you use on command lines (it will try to match the pattern to file names). If it can't make a match, you get the error you're getting ("no matches found").
You can disable this behaviour by disabling the nomatch option:
unsetopt nomatch
The manual page describes this option as follows (it describes what happens when the option is enabled):
If a pattern for filename generation has no matches, print an error, instead of leaving it unchanged in the argument list.
Try again with the option disabled:
$ unsetopt nomatch
$ ytmp3 https://www.youtube.com/watch?v=_DiEbmg3lU8
If you want to permanently disable the option, you can add the disable command to your ~/.zshrc file.
The question mark is part of ZSH's pattern matching, similarly to *. It means "Any character".
For instance, ls c?nfig will list both "config" and "cinfig", provided they exist.
So, yes, your problem is simply that zsh is trying to interpret the ? in the URL as a pattern to match to files, failing to find any, and crapping out. Escape the ? with a \ or put quotes around it, like you did, to fix it.

How can I implement the command 'ls' with wildcard, '*'?

EDIT #1 : I'm under the limit that all arguments are enclosed in two quotes, so that shell do not expand any argument with * to the corresponding path.
EDIT #2 : In order to retrieve directories such as */*, ../*, and dirA/*/file.out, How should I use iteration loop or recursive call?
I have just learned about the function fnmatch(). But I don't know start place.
There are many possible cases. I'm confused dealing with these all cases.
For example, Let me assume that executable program is a.out.
$./a.out -l */*
$./a.out -l ../*
$./a.out -l [file_name] [directory_name]
/* Since I also have to implement ls command with no wildcard. */
What should I do? Any advice would be awesome.
Thank you in advance.
Your problem is : shell replaces wildcard caracter * with all of the filenames matching the pattern.
Solution:
If you do not want to use this feature of bash, just put quotation marks around your command line arguments.
Calling your program that way will have the original arguments, containing wildcards.
After this, you can list all the filenames with their paths. For example using some recursive algorithm. Then you can apply some matching to these path string. (when visiting it)
If you want to be a good unix citizen, the rule is Don't do filename globbing unless you are writing a shell.
You want to write an ls-like program? Don't do any wildcard expansion. Don't treat "*" specially. Just treat your argv as a list of filenames. If your program handles these cases:
./a.out file1
./a.out file1 file2 file3
Then it will also handle
./a.out file*
correctly because the shell will do the expansion and your program won't need to know about it. And besides that, it will handle this:
zsh% ./a.out **/file<40-185>~file<90-100>(.mm-30OL[1,2])
which in zsh expanded glob syntax means: expand file40 through file185, except for file90 through file100, include only the ones that have been modified in the last 30 minutes, and use only the largest 2 files in the resulting set.
fnmatch is never going to do anything like that. But these fancy globs can be used with any command that just takes a filename list and doesn't care where it came from.
When you're in a situation where you can't take a list of filenames from the command line, then consider using fnmatch. ls isn't one of those situations.

Checking for environment variable

Using this UNIX script I am able to check if variable TEST_VAR is set or not:
: ${TEST_VAR:?"Not set or empty."}
I am new to unix so can someone please explain what is this command.
From bash manual:
${parameter:?word}
If parameter is null or unset, the expansion of word (or a message to
that effect if word is not present) is written to the standard error
and the shell, if it is not interactive, exits. Otherwise, the value
of parameter is substituted.
It is the original shell comment notation (before '#' to end of line). For a long time, Bourne shell scripts had a colon as the first character. The C Shell would read a script and use the first character to determine whether it was for the C Shell (a '#' hash) or the Bourne shell (a ':' colon). Then the kernel got in on the act and added support for '#!/path/to/program' and the Bourne shell got '#' comments, and the colon convention went by the wayside
Have a look at this similar question:
What's a concise way to check that environment variables are set in a Unix shell script?

Resources