How do Perl Cwd::cwd and Cwd::getcwd functions differ? - working-directory

The question
What is the difference between Cwd::cwd and Cwd::getcwd in Perl, generally, without regard to any specific platform? Why does Perl have both? What is the intended use, which one should I use in which scenarios? (Example use cases will be appreciated.) Does it matter? (Assuming I don’t mix them.) Does choice of either one affect portability in any way? Which one is more commonly used in modules?
Even if I interpret the manual is saying that except for corner cases cwd is `pwd` and getcwd just calls getcwd from unistd.h, what is the actual difference? This works only on POSIX systems, anyway.
I can always read the implementation but that tells me nothing about the meaning of those functions. Implementation details may change, not so defined meaning. (Otherwise a breaking change occurs, which is serious business.)
What does the manual say
Quoting Perl’s Cwd module manpage:
Each of these functions are called without arguments and return the absolute path of the current working directory.
getcwd
my $cwd = getcwd();
Returns the current working directory.
Exposes the POSIX function getcwd(3) or re-implements it if it's not available.
cwd
my $cwd = cwd();
The cwd() is the most natural form for the current architecture. For most systems it is identical to `pwd` (but without the trailing line terminator).
And in the Notes section:
Actually, on Mac OS, the getcwd(), fastgetcwd() and fastcwd() functions are all aliases for the cwd() function, which, on Mac OS, calls `pwd`. Likewise, the abs_path() function is an alias for fast_abs_path()
OK, I know that on Mac OS1 there is no difference between getcwd() and cwd() as both actually boil down to `pwd`. But what on other platforms? (I’m especially interested in Debian Linux.)
1 Classic Mac OS, not OS X. $^O values are MacOS and darwin for Mac OS and OS X, respectively. Thanks, #tobyink and #ikegami.
And a little meta-question: How to avoid asking similar questions for other modules with very similar functions? Is there a universal way of discovering the difference, other than digging through the implementation? (Currently, I think that if the documentation is not clear about intended use and differences, I have to ask someone more experienced or read the implementation myself.)

Generally speaking
I think the idea is that cwd() always resolves to the external, OS-specific way of getting the current working directory. That is, running pwd on Linux, command /c cd on DOS, /usr/bin/fullpath -t in QNX, and so on — all examples are from actual Cwd.pm. The getcwd() is supposed to use the POSIX system call if it is available, and falls back to the cwd() if not.
Why we have both? In the current implementation I believe exporting just getcwd() would be enough for most of systems, but who knows why the logic of “if syscall is available, use it, else run cwd()” can fail on some system (e.g. on MorphOS in Perl 5.6.1).
On Linux
On Linux, cwd() will run `/bin/pwd` (will actually execute the binary and get its output), while getcwd() will issue getcwd(2) system call.
Actual effect inspected via strace
One can use strace(1) to see that in action:
Using cwd():
$ strace -f perl -MCwd -e 'cwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "cwd(); "], [/* 27 vars */]) = 0
[pid 31276] execve("/bin/pwd", ["/bin/pwd"], [/* 27 vars */] <unfinished ...>
[pid 31276] <... execve resumed> ) = 0
Using getcwd():
$ strace -f perl -MCwd -e 'getcwd(); ' 2>&1 | grep execve
execve("/usr/bin/perl", ["perl", "-MCwd", "-e", "getcwd(); "], [/* 27 vars */]) = 0
Reading Cwd.pm source
You can take a look at the sources (Cwd.pm, e.g. in CPAN) and see that for Linux cwd() call is mapped to _backtick_pwd which, as the name suggests, calls the pwd in backticks.
Here is a snippet from Cwd.pm, with my comments:
unless ($METHOD_MAP{$^O}{cwd} or defined &cwd) {
...
# some logic to find the pwd binary here, $found_pwd_cmd is set to 1 on Linux
...
if( $os eq 'MacOS' || $found_pwd_cmd )
{
*cwd = \&_backtick_pwd; # on Linux we actually go here
}
else {
*cwd = \&getcwd;
}
}
Performance benchmark
Finally, the difference between two is that cwd(), which calls another binary, must be slower. We can make some kind of a performance test:
$ time perl -MCwd -e 'for (1..10000) { cwd(); }'
real 0m7.177s
user 0m0.380s
sys 0m1.440s
Now compare it with the system call:
$ time perl -MCwd -e 'for (1..10000) { getcwd(); }'
real 0m0.018s
user 0m0.009s
sys 0m0.008s
Discussion, choice
But as you don't usually query the current working directory too often, both options will work — unless you cannot spawn any more processes for some reason related to ulimit, out of memory situation, etc.
Finally, as for selecting which one to use: for Linux, I would always use getcwd(). I suppose you will need to make your tests and select which function to use if you are going to write a portable piece of code that will run on some really strange platform (here, of course, Linux, OS X, and Windows are not in the list of strange platforms).

Related

Makefile wildcard function evaluates to true against empty string

Using GNU Make 4.2.1
I'm writing a Makefile, and I want to use a conditional statement to have make check whether it is on a specific remote server or not. I'd like to do this using the unix HOSTNAME environment variable. I also want it to run regardless of subdomain, so I use the make wildcard function.
ifeq ($(wildcard *.remote.server.com),$(HOSTNAME))
echo "ON REMOTE SERVER"
else
echo "NOT ON REMOTE SERVER"
endif
This looks like it would work, but on my local machine the HOSTNAME environment variable is not set and the ifeq test evaluates to true and prints ON REMOTE SERVER.
This doesn't make sense to me; *.remote.server.com is being compared to an empty string and should evaluate false and print NOT ON REMOTE SERVER.
Am I missing something about either unix environment variables, wildcard, or make conditionals, or all three?
Edit: Problem resolved. Learned that this is not the spot to use the wildcard function. Instead, used something similar to the following line
ifeq "$(shell hostname | sed -e 's/.*.remote.server.com/remote.server.com/')" "remote.server.com"
Based on this response:
How to use bash regex inside Makefile Target
You could just print the values of these things to see what they are. Something like $(info wildcard='$(wildcard *.remote.server.com)' hostname='$(HOSTNAME)')? The reason for this behavior depends entirely on your local system, which we do not have access to.
However, I don't see why you say that *.remote.server.com is being compared to an empty string. You've written $(wildcard *.remote.server.com) and presumably you don't have a file that matched the glob *.remote.server.com, which means the wildcard function expands to the empty string.
I think you might be confused about what the wildcard function does: what did you expect it to do? It has nothing whatever to do with hostnames.

How does execlp work exactly?

So I am looking at my professor's code that he handed out to try and give us an idea of how to implement >, <, | support into our unix shell. I ran his code and was amazed at what actually happened.
if( pid == 0 )
{
close(1); // close
fd = creat( "userlist", 0644 ); // then open
execlp( "who", "who", NULL ); // and run
perror( "execlp" );
exit(1);
}
This created a userlist file in the directory I was currently in, with the "who" data inside that file. I don't see where any connection between fd, and execlp are being made. How did execlp manage to put the information into userlist? How did execlp even know userlist existed?
Read Advanced Linux Programming. It has several chapters related to the issue. And we cannot explain all this in a few sentences. See also the standard stream and process wikipages.
First, all the system calls (see syscalls(2) for a list, and read the documentation of every individual system call that you are using) your program is doing should be tested against failure. But assume they all succeed. After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it, hence fd is 1; you have redirected your stdout to the newline created userlist file.
At last, you are calling execlp(3) which will call execve(2). When successful, your entire process is restarted with the new executable (so a fresh virtual address space is given to it), and its stdout is still the userlist file descriptor. In particular (unless execve fails) the perror call is not reached.
So your code is a bit what a shell running who > userlist is doing; it does a redirection of stdout to userlist and runs the who command.
If you are coding a shell, use strace(1) -notably with -f option- to understand what system calls are done. Try also strace -f /bin/sh -c ls to look into the behavior of a shell. Study also the source code of existing free software shells (e.g. bash and sash).
See also this and the references I gave there.
execlp knowns nothing. Before execing stdout was closed and a file opened, so the descriptor is the one corresponding to stdout (opens always returns the lowest free descriptor). At that point the process has an "stdout" plugged to the file. Then exec is called and this replaces to whole address space, but some properties remains as the descriptors, so know the code of who is executed with an stdout that correspond to the file. This is the way redirections are managed by shells.
Remember that when you use printf (for example) you never specify what stdout exactly is... That can be a file, a terminal, etc.
Basile Starynkevitch correctly explained:
After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it…
This is because, as Jean-Baptiste Yunès wrote, "opens always returns the lowest free descriptor".
It should be stressed that the professor's code only likely works; it fails if file descriptor 0 is closed.

Phabricator Making Assumptions About Environment

I am attempting to get Phabricator running on Solaris over apache. The website is working, but all of the cli scripts are not. For example, phd.
The first problem, is that it is not passing arguments to the underling manage-daemons.php script that it invokes. Looking at the phd file, this does not surprise me:
$> cat phd
../scripts/daemon/manage_daemons.php
Now, given my default shell is bash, this isn't going to pass-through my arguments. To do this, I have modified the script:
#! /bin/bash
../scripts/daemon/manage_daemons.php $*
This will now pass-through the arguments, but it's now failing to find transative scripts it requires via relative path:
./phd start
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.
Launching daemon "PhabricatorRepositoryPullLocalDaemon".
[2014-05-09 19:29:59] EXCEPTION: (CommandException) Command failed with error #127!
COMMAND
exec ./phd-daemon 'PhabricatorRepositoryPullLocalDaemon' --daemonize --log='/var/tmp/phd/log/daemons.log' --phd='/var/tmp/phd/pid'
STDOUT
(empty)
STDERR
./phd-daemon: line 1: launch_daemon.php: not found
at [/XXX/XXX/libphutil/src/future/exec/ExecFuture.php:398]
#0 ExecFuture::resolvex() called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:167]
#1 PhabricatorDaemonManagementWorkflow::launchDaemon(PhabricatorRepositoryPullLocalDaemon, Array , false) called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:246]
#2 PhabricatorDaemonManagementWorkflow::executeStartCommand() called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementStartWorkflow.php:18]
#3 PhabricatorDaemonManagementStartWorkflow::execute(Object PhutilArgumentParser) called at [/XXX/XXX/libphutil/src/parser/argument/PhutilArgumentParser.php:396]
#4 PhutilArgumentParser::parseWorkflowsFull(Array of size 9 starting with: { 0 => Object PhabricatorDaemonManagementListWorkflow }) called at [/XXX/XXX/libphutil/src/parser/argument/PhutilArgumentParser.php:292]
#5 PhutilArgumentParser::parseWorkflows(Array of size 9 starting with: { 0 => Object PhabricatorDaemonManagementListWorkflow }) called at [/XXX/XXX/phabricator/scripts/daemon/manage_daemons.php:30]
Note I have obscured my paths with XXX as they give away sensitive information.
Now, obviously I shouldn't be modifying these scripts. This is an indication that some prerequisite is not set up properly.
It's clear to me that Phabricator is making some (bold) assumption about my setup. But I'm not quite sure what...?
These are supposed to be symlinks. For example, if you look at "phd" in the repository on GitHub, you can see that the file type is "symbolic link":
https://github.com/facebook/phabricator/blob/master/bin/phd
Something in your environment is incorrectly turning the symlinks into normal files. I'm not aware of any Git configuration which can cause this, although it's possible there is something. One situation where I've seen this happen is when a working copy was cloned, then copied using something like rsync without appropriate flags to preserve symlinks.

Calling external program from R with multiple commands in system

I am new to programming and mainly I am able to do some scripts within R, but for my work I need to call an external program. For this program to work on the ubuntu's terminal I have to first use setenv and then execute the program. Googling I've found the system () and Sys.setenv() functions, but unfortunately I can make it function.
This is the code that does work in the ubuntu terminal:
$ export PATH=/home/meme/bin:$PATH
$ mast "/home/meme/meme.txt" "/home/meme/seqs.txt" -o "/home/meme/output" -comp
Where the first two arguments are input files, the -o argument is the output directory and the -comp is another parameter for the program to run.
The reason that I need to do it in R despite it already works in the terminal is because I need to run the program 1000 times with 1000 different files so I want to make a for loop where the input name changes in every loop and then analyze every output in R.
I have already tried to use:
Sys.setenv(PATH="/home/meme/bin"); system(mast "/home/meme/meme.txt" "/home/meme/seqs.txt" -o "/home/meme/output" -comp )
and
system(Sys.setenv(PATH="/home/meme/bin") && mast "/home/meme/meme.txt" "/home/meme/seqs.txt" -o "/home/meme/output" -comp )
but always received:
Error: unexpected constant string in "system(mast "/home/meme/meme.txt""
or
Error: unexpected symbol in "system(Sys.setenv(PATH="/home/meme/bin") && mast "/home/meme/meme.txt""
At this point I have run out of ideas to make this work. If this has already been answered, then my googling have just been poor and I would appreciate any links to its response.
Thank you very much for your time.
Carlos
Additional details:
I use Ubuntu 12.04 64-bits version, RStudio version 0.97.551, R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Platform: x86_64-pc-linux-gnu (64-bit).
The program I use (MAST) finds a sequence pattern in a list of letters and is part of the MEME SUIT version 4.9.1 found in http://meme.nbcr.net/meme/doc/meme-install.html and run through command line. The command-line usage for mast is:
mast <motif file> <sequence file> [options]
Construct the string you want to execute with paste and feed that to system:
for(i in 1:10){
cmd=paste("export FOO=",i," ; echo \"$FOO\" ",sep='')
system(cmd)
}
Note the use of sep='' to stop paste putting spaces in, and back-quoting quote marks in the string to preserve them.
Test before running by using print(cmd) instead of system(cmd) to make sure you are getting the right command built. Maybe do:
if(TESTING){print(cmd)}else{system(cmd)}
and set TESTING=TRUE or FALSE in R before running.
If you are going to be running more than one shell command per system call, it might be better to put them all in one shell script file and call that instead, passing parameters from R. Something like:
cmd = paste("/home/me/bin/dojob.sh ",i,i+1)
system(cmd)
and then dojob.sh is a shell script that parses the args. You'll need to learn a bit more shell scripting.

frama-c jessie killed during VC generation

I'm trying to apply frama-c/jessie on a module of a proprietary safety critical system from our customer. The function under analysis is pretty big (image 700 uncommented lines count) with a lot of conditional statements as well as complex (&&, ||, etc).
I got this error message when I ran it on Ubuntu VM 64 bit machine. It appears Error 137 is related to memory size, etc. But I'm not quite sure.
Any suggestion for how to approach this error is greatly appreciated.
[
formal_verification]$ frama-c -jessie test.c
[kernel] preprocessing with "gcc -C -E -I. -dD test.c"
[jessie] Starting Jessie translation
[jessie] Producing Jessie files in subdir test.jessie
[jessie] File test.jessie/test.jc written.
[jessie] File test.jessie/test.cloc written.
[jessie] Calling Jessie tool in subdir tests.jessie
Generating Why function testFun
[jessie] Calling VCs generator.
gwhy-bin [...] why/test.why
Computation of VCs...
Killed
make: *** [test.stat] Error 137
with a lot of conditional statements as well as complex (&&, ||, etc).
You should use the so-called “fast WP” option when analyzing functions with lots of nested conditionals. Otherwise, the target does not even need to be very large to cause a blowup.
It happens to be the example in Jessie's manual for passing options to Why (it is really a Why option):
-jessie-why-opt=<s>
give an option to Why (e.g., -fast-wp)
You would therefore use -jessie-why-opt=-fast-wp.

Resources