How to specify base directory for FUSE filesystem? - mount

I am trying to create a FUSE filesystem called ordered-dirs using the Haskell wrapper over libfuse, HFuse. This filesystem is a "derived filesystem", i.e. it takes an existing directory (the "base directory") and produces a different view of it.
However, when I try to run my FUSE filesystem program, specifying the arguments in the ordinary mount way, I get an error:
$ ordered-dirs /home/robin/tasks/ /home/robin/to
fuse: invalid argument `/home/robin/to'
There is no way in HFuse (or in libfuse, it seems) to get the base directory (the first argument), so I had just written my own code to get it. But it's not this code that's failing - it's code within C libfuse itself - as the error message indicates.
So what is the correct way to pass the base directory to a fuse filesystem executable that uses libfuse to parse its arguments?

Surprisingly, it seems that the way to do this is to simply strip the base directory argument from the command-line arguments that are parsed to the libfuse parser, so that libfuse never sees it.
In the particular case of HFuse, this can be done by calling fuseRun instead, which allows the command-line arguments to be passed in explicitly. You can see how I've done this here - here is the relevant code (in which I've called the base directory source):
main :: IO ()
main = do
args <- getArgs
let (maybeSource, remainder) = extractSource args
source <- maybe (fail "source not specified") return maybeSource
fuseRun "ordered-dirs" remainder (orderedDirOps source) defaultExceptionHandler

Related

robotframework - get all the variables/arguments passed to an execution

Is there a way to access all the variables/arguments passed through the command line or variable file (-V option) during robotframework execution. I know in python the execution can access it with 'sys.args' feature.
The answer for getting the CLI arguments is inside your question - just look at the content of the sys.argv, you'll see everything that was passed to the executor:
${args}= Evaluate sys.argv sys
Log To Console ${args}
That'll return a list, where the executable itself (run.py) is the 1st member, and all arguments and their values present the in the order given during the execution:
['C:/my_directories/rf-venv/Lib/site-packages/robot/run.py', '--outputdir', 'logs', '--variable', 'USE_BROWSERSTACK:true', '--variable', 'IS_DEV_ENVIRONMENT:false', '--include', 'worky', 'suites\\test_file.robot']
You explicitly mention variable files; that one is a little bit trickier - the framework parses the files itself, and creates the variables according to its rules. You naturally can see them in the CLI args up there, and the other possibility is to use the built-in keyword Get Variables, which "Returns a dictionary containing all variables in the current scope." (quote from its documentation). Have in mind though that these are all variables - not only the passed on the command line, but also the ones defined in the suite/imported keywords etc.
You have Log Variables to see their names and values "at current scope".
There is no possibility to see the arguments passed to robot.

gperftools failing to identify files

Is there a way to avoid Google Performance Tools listing files as "??:?", that is, failing to locate which file contains the function it is reporting on? How can I work out which library contains the function being called?
$ env LD_PRELOAD="/usr/lib/libprofiler.so.0" \
CPUPROFILE=output.prof python script.py
$ google-pprof --text --files /usr/bin/python output.prof
Using local file /usr/bin/python.
Using local file output.prof.
Removing _L_unlock_13 from all stack traces.
Total: 433 samples
362 83.6% 83.6% 362 83.6% dtrsm_ ??:?
58 13.4% 97.0% 58 13.4% dgemm_ ??:?
1 0.2% 97.2% 1 0.2% PyDict_GetItem /.../Objects/dictobject.c
1 0.2% 97.5% 1 0.2% PyParser_AddToken /.../Parser/parser.c
...
I am aiming to be able to profile the C code in a python package that has many compiled C extension modules. In the toy example above, what would I do to track down where "dtrsm_" is defined? If there are multiple loaded libraries that contain functions with that same name, is there any way to tell which version is being called?
C/C++ won't compile if the same pre-processed sourcefile (e.g. with #includes expanded) contains duplicate definitions for the same symbol. (Note that in the case of C++, symbols are mangled, according to compiler-specific schemes, to incorporate the argument signature so as to facilitate overloaded functions, which could not otherwise be differentiated.)
The linker is only concerned with unresolved symbols (so there ought be nothings preventing multiple libraries concurrently calling their own respective internally-defined functions with coincident names). If a file invokes a declared but undefined function, and multiple available libraries implement that symbol, then the linker is free to choose (say by precedence in a search-path) which version gets substituted in. (Incidentally, this is the same mechanism by which profilers such as gperftools or hpctoolkit are able to inject themselves and alter the normal behaviour of another application.)
Since different libraries are mapped to separate pages of memory, it ought to be possible to identify (from memory addresses) which library contains the executing version of a function. Indeed, the GNU debugger can identify the library that code is contained by, even when it fails to name a function.
$ gdb python
(gdb) run -c "from numpy import *; linalg.inv(random.random((1000,1000)))"
CTRL-C
(gdb) backtrace
#0 0x00007ffff5ba9df8 in dtrsm_ () from /usr/lib/libblas.so.3
...
#3 0x00007ffff420df83 in ?? () from /.../numpy/linalg/_umath_linalg.so
Linux (or rather the GNU C library) provides the "backtrace" call (for getting a list of pointers from the call stack), and the "backtrace_symbols" call for automatically converting each of those pointers to a descriptive string such as:
"/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fc429929ec5]"
Gperftools can (judging from a query on the github mirror) call the generic "backtrace", but instead of "backtrace_symbols" it "forks out to pprof to do the actual symbolizing". This is a fairly-epic perl script, and looks likely where the "??" comes from.
Crucially, google-pprof is trying to report on the source-file (and line-number) which defines the function, not the binary-file containing the machine-code (that is typically quoted in stack traces). It invokes the "nm" utility. On my system it appears (by running "nm -l -D") that libblas, unlike libc and the python binary, has been stripped of such debugging symbols (presumably for optimisation), explaining the result.
To answer the original question: the call-stack samples should definitively and explicitly specify which version is being called. These can probably be dumped using an option which was added in google-pprof several months ago, or (for time-intensive functions) can be roughly ascertained by manual resampling using gdb. (It's even conceivable that g-pprof can be adjusted to explicitly identify the binaries paths in its output summaries.) Alternatively one can run "nm" (and grep) on the candidate binaries/libraries (of which a short-list can be obtained by running "strings" on the profiler's raw output, among other methods). If the source is accessible (to grep) or the libraries are popular (on the web) then of course (and per Mike Dunlavey) it may be easiest to just query for the function name. In theory the "??:?" may be addressed by carefully recompiling the offending objects.
Just Google the offending function names. The ones you show above are defined in LAPACK. dtrsm is for solving a matrix equation. dgemm is for multiplying matrices.
What you need to know is 1) why they are being called, and 2) how big the matrices are.
To find out why they are being called, what I do is just examine individual stack samples, as here.
The reason matrix size matters is if they are small, these LAPACK routines can actually spend a relatively large fraction of their time just classifying their inputs, such as by calling a function LSAME.

Julia: check whether c library exists

Is there a way to check whether a c library can be found by the system?
I tried to use a try catch block on a library call to test whether it exists, but that actually kills the program.
try
ccall( (:func, "libfoo"), Bool, () )
catch
println("This line is never called. Ever")
end
The associated error is:
ERROR: error compiling anonymous: could not load module libfoo: libfoo: cannot open shared object file: No such file or directory
You could look before you leap using find_library:
julia> find_library(["libc"])
"libc"
julia> find_library(["libfoo"])
""
where you'll get the empty string if not found.
julia> help(find_library)
INFO: Loading help data...
Base.find_library(names, locations)
Searches for the first library in "names" in the paths in the
"locations" list, "DL_LOAD_PATH", or system library paths (in
that order) which can successfully be dlopen'd. On success, the
return value will be one of the names (potentially prefixed by one
of the paths in locations). This string can be assigned to a
"global const" and used as the library name in future
"ccall"'s. On failure, it returns the empty string.

How can one really create a process using Unix.create_process in OCaml?

I have tried
let _ = Unix.create_process "ls" [||] Unix.stdin Unix.stdout Unix.stderr
in utop, it will crash the whole thing.
If I write that into a .ml and compile and run, it will crash the terminal and my ubuntu will throw a system error.
But why?
The right way to call it is:
let pid = Unix.create_process "ls" [|"ls"|] Unix.stdin Unix.stdout Unix.stderr
The first element of the array must be the "command" name.
On some systems /bin/ls is a link to some bigger executable that will look at argv.(0) to know how to behave (c.f. Busybox); so you really need to provide that info.
(You see more often that with /usr/bin/vi which is now on many systems a sym-link to vim).
Unix.create_process actually calls fork and the does an execvpe, which itself calls the execv primitive (in the OCaml C implementation of the Unix module).
That function then calls cstringvect (a helper function in the C side of the module implementation), which translates the arg parameters into an array of C string, with last entry set to NULL. However, execve and the like expect by convention (see the execve(2) linux man page) the first entry of that array to be the name of the program:
argv is an array of argument strings passed to the new program. By
convention, the first of these strings should contain the filename
associated with the file being executed.
That first entry (or rather, the copy it receives) can actually be changed by the program receiving these args, and is displayed by ls, top, etc.

Passing arguments to execl

I want to create my own pipeline like in Unix terminal (just to practice). It should take applications to execute in quotes like that:
pipeline "ls -l" "grep" ....
I know that I should use fork(), execl() (exec*) and API to redirect stdin and stdout. But are there any alternatives for execl to execute app with arguments using just one argument which includes application path and arguments? Is there a way not to parse manually ls -l but pass it as one argument to execl?
If you have only a single command line instead of an argument vector, let the shell do the parsing for you:
execl("/bin/sh", "sh", "-c", the_command_line, NULL);
Of course, don't let untrusted remote user input into this command line. But if you are dealing with untrusted remote user input to begin with, you should try to arrange to pass actual a list of isolated arguments to the target application as per normal usage of exec[vl], not a command line.
Realistically, you can only really use execl() when the number of arguments to the command are known at compile time. In a shell, you'll normally use execv() or execvp() instead; these can handle an arbitrary number of arguments to the command to be executed. In theory, you use execv() when the path name of the command is given and execvp() (which does a PATH-based search for the command) when it isn't. However, execvp() handles the 'path given' case, so simply use execvp().
So, for your pipeline command, you'll end up with one child using something equivalent to:
char *args_1[] = { "ls", "-l", 0 };
execvp(args_1[0], args_1);
The other child will end up using something equivalent to:
char *args_2[] = { "grep", "pattern", 0 };
execvp(args_2[0], args_2);
Except, of course, that you'll have created those strings from the command line arguments instead of by initialization as shown. Note that grep requires a pattern to search for.
You've still got plumbing issues to resolve. Make sure you close enough pipe file descriptors. When you dup() or dup2() a pipe to standard input or standard output, you close both the file descriptors from the pipe() function.

Resources