Changing function reference in Mach-o binary - dynamic-linking

I need to change to reference of a function in a mach-o binary to a custom function defined in my own dylib. The process I am now following is,
Replacing references to older functions to the new one. e.g _fopen to _mopen using sed.
I open the mach-o binary in MachOView to find the address of the entities I want to change. I then manually change the information in the binary using a hex editor.
Is there a way I can automate this process i.e write a program to read the symbols, and dynamic loading info and then change them in the executable. I was looking at the mach-o header files at /usr/include/mach-o but am not entire sure how to use them to get this information. Do there exist any libraries present - C or python which help do the same?

interesting question, I am trying to do something similar to static lib; see if this helps

varrunr - you can easily achieve most if not all of the functionality using DYLD's interposition. You create your own library, and declare your interposing functions, like so
// This is the expected interpose structure
typedef struct interpose_s {
void *new_func;
void *orig_func;
} interpose_t;
static const interpose_t interposing_functions[] \
__attribute__ ((section("__DATA, __interpose"))) = {
{ (void *)my_open, (void *) open }
};
.. and you just implement your open. In the interposing functions all references to the original will work - which makes this ideal for wrappers. And, you can insert your dylib forcefully using DYLD_INSERT_LIBRARIES (same principle as LD_PRELOAD on Linux).

Related

Is there a way to specify the base address of a shared library using dlopen()?

It seems that when we dlopen() some libraries, they will be loaded into some preferred (but not fixed) addresses. I've checked the source code of dlopen(), and a core function says
static __always_inline const char *
_dl_map_segments (struct link_map *l, int fd,
const ElfW(Ehdr) *header, int type,
const struct loadcmd loadcmds[], size_t nloadcmds,
const size_t maplength, bool has_holes,
struct link_map *loader)
{
const struct loadcmd *c = loadcmds;
if (__glibc_likely (type == ET_DYN))
{
/* This is a position-independent shared object. We can let the
kernel map it anywhere it likes, but we must have space for all
the segments in their specified positions relative to the first.
So we map the first segment without MAP_FIXED, but with its
extent increased to cover all the segments. Then we remove
access from excess portion, and there is known sufficient space
there to remap from the later segments.
As a refinement, sometimes we have an address that we would
prefer to map such objects at; but this is only a preference,
the OS can do whatever it likes. */
ElfW(Addr) mappref
= (ELF_PREFERRED_ADDRESS (loader, maplength,
c->mapstart & GLRO(dl_use_load_bias))
- MAP_BASE_ADDR (l));
/* Remember which part of the address space this object uses. */
l->l_map_start = (ElfW(Addr)) __mmap ((void *) mappref, maplength,
c->prot,
MAP_COPY|MAP_FILE,
fd, c->mapoff);
if (__glibc_unlikely ((void *) l->l_map_start == MAP_FAILED))
return DL_MAP_SEGMENTS_ERROR_MAP_SEGMENT;
...
}
The comment says you can specify a preferred address, but OS will determine whether to use it.
Question
Is there any way we can specify the base address for each dlopened module?
ELF_PREFERRED_ADDRESSS is set to 0 by default, but this macro seems to infer that the preferred addresses can be changed, say by an environment variable? But even there is one, I doubt that it can be changed for each dlopened library.
If I want to implement this feature myself, it seems that I need to wrap a new dlopen function and pass the preferred address to the above core function (and use MAP_FIXED maybe). Is it correct?
Thanks!
Is there any way we can specify the base address for each dlopened module?
No.
ELF_PREFERRED_ADDRESSS is set to 0 by default, but this macro seems to infer that the preferred addresses can be changed, say by an environment variable? But even there is one, I doubt that it can be changed for each dlopened library.
This code is compiled into the dynamic loader ld-linux.so and can not be changed after the compilation.
If I want to implement this feature myself, it seems that I need to wrap a new dlopen function and pass the preferred address to the above core function (and use MAP_FIXED maybe). Is it correct?
The function is private to ld-linux. You will not be able to wrap it, or call it from outside of ld-linux.
P.S. What you are likely looking for is the prelink command.

How to create bindings qml to d?

I want to use qml with d language. But there is not bindings to d, and I want to create it. But I don't know how to begin. Tell me, please, how to begin to create bindings.
Since nobody answered:
From what I understand QML is the modelling language of Qt and I guess it depends heavily on Qt. I assume here it depends on Qt, at least to some extend.
First of all there was already an attempt to bind Qt to D: http://www.dsource.org/projects/qtd, but from what I've heared this project is kinda dead and not developed anylonger (last commit 2 years ago). You could use it as base for your work or as a reference on how you could bind QML and Qt.
1. Option, a C/C++ Glue-Layer
A C-Glue Layer means, you write your code basically twice. You write a complete C++ to C wrapper in C++ (the language which can directly interface with Qt and Qml). That means you wrap every method of a class inside a C function which takes a Pointer (to a struct representing this C++ Qt class). This could look like this (note this is an abstraction of GtkWebkit, which is written in C, but the snippet demonstrates how to do it quite well):
// somewhere in a header
typedef struct SurfiClient {
GtkWidget *window; // Offscreen window
// ....
}
typedef GdkPixbuf Pixbuf;
extern "C" {
Pixbuf* surfi_client_get_pixbuf(SurfiClient* client)
{
// in C++ this would gtk_offscreen_window_get_pixbuf would be a method of client->window
return gtk_offscreen_window_get_pixbuf(GTK_OFFSCREEN_WINDOW(client->window));
}
// here go the rest of these functions, probably thousands
}
You have to basically do this for everything you want to interface later on from the D side. Even worse you have to also do it for namespaces and free functions which are not marked extern "C", this could look like this (libsquish):
typedef unsigned char u8;
extern "C" {
void CompressMasked(u8 const* rgba, int mask, void* block, int flags) {
squish::CompressMasked(rgba, mask, block, flags);
}
}
As you can see by now, this is quite tedious...
Let's assume you have finished making the C/C++ Glue-Layer, now you have to create code in D which can interface it.
To stay with the gtk example:
extern(C) {
// Using an opaque struct is one option
struct SurfiClient;
// the other is to wrap the struct correctly
struct SurfiClient {
GtkWidget *window;
}
// The Pixbuf was only a typedef to GdkPixbuf which is already an opaque data structure, easy
struct Pixbuf;
Pixbuf* surfi_client_get_pixbuf(SurfiClient* client);
}
You see in this example a problem, if you want to wrap the SurfiClient struct correctly, you also have to wrap GtkWidget or do it incorrectly and use void* instead of GtkWidget*, which is no real soloution to the problem. You will most likely also run into this problem, you Glue-Layer struct has members which you don't have abstractions for, I would go here with the opaque struct and provide functions for the members which are really needed for the user.
I am not going more into detail on how to interface with C, there are already a few guides:
http://dlang.org/interfaceToC.html
http://www.gamedev.net/blog/1140/entry-2254003-binding-d-to-c/
https://github.com/D-Programming-Deimos (Not a guide but a collection of C bindings, could be used as reference)
The last step in the process of makind Qt, Qml bindings would be to rebuild the OOP-Api in D based on your newly made C-Bindings.
2. Option, SWIG/Binding generator
I am not an expert with SWIG, that's the reason why I am only covering it in a few sentences.
What you can use SWIG for is generate the whole C/C++-Glue Layer thingy for you. If you're lucky, your SWIG-File only consists of a few includes to Qt and SWIG will do all the work for you. If not, you have to define rules for Classes and Functions on your own, which can be tedious (but also easier and faster than doing 1. Option). So SWIG is definitly worth a try!
As a side-note: If you have a template/macro/ heavy/Header only C++ Library like glm SWIG can be tricky or in case of glm no option.
There are alternative Binding-Generators, e.g. the PySide Project started with Boost.Python then switched to Shiboken. I don't know how easily you can generate bindings with Shiboken for anything else than CPython, maybe hacking into Shiboken or even Boost.Python could work? Also worth a read: http://setanta.wordpress.com/binding-c/.
QtD used QtJambi so this might be a good start.
3. Option, the D way
D has the great idea of having extern(C++), which allows in theory making C++/D bindings without such a Glue-Layer: http://dlang.org/cpp_interface.html.
Nice idea, but unfortunatly too limited. E.g. there is no support for namespaces yet (there is an open issue on bugzilla, I can't find right now). In my opinion extern(C++) is too limited for Qt.
Manu Evans mentioned in his first talk at the D conference how to bind to C++ from D sucessfully with using Ds metaprogramming capabilities.
In a nutshell
A C/C++ Glue-Layer gives you the most flexibility, it will work, but is no simple and especially a long task (I would do this for rather small projects).
SWIG/Binding generators, the way I would go for Qt, once correctly setup they do all the work for you (in the best case).
extern(C++), nice idea, too limited for most serious C++ projects.
I hope this gives you a short overview of what you can do and the amount of work it requires.
they are already an article on this purpose how to interface C code to D?
Usually is not hard. Take function declaration and put it into an extern(C) block
And usually these module are written into a c package. Example:
src/
`-- appName
|-- c
| `-- dInterface.d
`-- dwrapper.d
The module appName.c.dInterface will define C function with an extern(C) block
While the module appName.dwrapper will provide a way that fit more with dlang.

How do I change Closure Compiler compile options not exported to command line?

I found that some options in CompilerOption are not exported to the command line.
For example, alias all strings is available in the Closure Compiler's Java API CompilerOption but I have no idea how set this in the command line.
I know I can create a new java class, like:
Compiler c = new Compiler();
ComppilerOptions opt = new ComppilerOptions();
opt.setAliasAllString(true);
c.compile(.....);
However I have to handle the command line args myself.
Any simple idea?
============================
In order to try the alias all string option, I write a simple command line application based on compiler.jar.
However I found that, the result I got when open the alias all string is not what I expected.
For example:
a["prototype"]["say"]=function(){
var a="something string";
}
Given the above code, the something string will be replaced by a variable like this:
var xx="something string";
....
var a=xx;
....
This is fine, but how about the string "say"? How does the closure compiler know this should be aliased(replace it use variable) or exported(export this method)?
This is the compiled code now:
a.prototype.say=function(){....}
It seems that it export it.
While I want this:
var a="prototype",b="say",c="something string";
xx[a][b]=function(){.....}
In fact, this is the google_map-like compilation.
Is this possible?
Not all options are available from the command line - this includes aliasAllStrings. For some of them you have the following options:
Build a custom version of the compiler
Use the Java API (see example).
Use plovr
Getting the same level of compression and obfuscation as the Maps API requires code written specifically for the compiler. When properly written, you'll see property and namespace collapsing, prototype aliasing and a whole host of others. For an example of the style of code that will optimize that way, take a look at the Closure Library.
Modifying http://code.google.com/p/closure-compiler/source/browse/trunk/src/com/google/javascript/jscomp/CompilationLevel.java?r=706 is usually easy enough if you just want to play with something.
Plovr (a Closure build tool) provides an option called experimental-compiler-options, which is documented as follows:
The Closure Compiler contains many options that are only available programmatically in Java. Many of these options are experimental or not finalized, so they may not be a permanent part of the API. Nevertheless, many of them will be useful to you today, so plovr attempts to expose these the experimental-compiler-options option. Under the hood, it uses reflection in Java, so it is fairly hacky, but in practice, it is a convenient way to experiment with Closure Compiler options without writing Java code.

In Qt, how to setCodecForCStrings globally (in header)?

I'm developing a bunch of Qt applications in C++ and they all use some modules (translation units) for common functionality that use Qt as well.
Whenever I convert a C string (implicit conversion) or C++ string object (fromStdString()) to a QString object, I expect the original data to be UTF-8 encoded and vice versa (toStdString()).
Since the default is Latin-1, I have to set the codec "manually" (in the init procedure of every one of my programs) to UTF-8:
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf8"));
Not all of my modules have an init procedure. The ones containing a class do (I can put this line in the class constructor), but some modules just contain a namespace with lots of functions. So there's no place for setCodecForCStrings(). Whenever I convert from/to a QString implicitly (from within one of my modules), I rely on the codec being already set by the init procedure of the main program, which seems to be a rather bad solution.
Is there a reliable way to set the codec to UTF-8 in my modules, or will I just have to be very careful not to use implicit conversions at all (in my modules at least) and write something like std::string(q.toUtf8().constData()) instead of q.toStdString()?
This can be done using a class definition for an automatically instantiated singleton-similar class having some init code in its constructor:
class ModuleInit {
static ModuleInit *instance;
public:
ModuleInit() {
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("utf8"));
}
};
ModuleInit * ModuleInit::instance = new ModuleInit(); // put this line in .cpp!
Putting this code anywhere into any project will set the text codec to UTF-8. You can, however, overwrite the text codec in the main() method, since the code above is executed even before the main() method.
With "anywhere" I of course mean places where it is syntactically allowed to. Put the class definition in some header file, and the instantiation in the appropriate .cpp file.
You can even put the whole code in one .cpp file, say moduleinit.cpp which you compile and link, but never explicitly use (there is no header file you could include...). This will also avoid accidental instantiations except the very first one and you will not have any problems with duplicate class names.
Note that you can't set the codec for C-strings in one particular file. Setting the codec using QTextCodec::setCodecForCString will set it application-wide!

ld linker question: the --whole-archive option

The only real use of the --whole-archive linker option that I have seen is in creating shared libraries from static ones. Recently I came across Makefile(s) which always use this option when linking with in house static libraries. This of course causes the executables to unnecessarily pull in unreferenced object code. My reaction to this was that this is plain wrong, am I missing something here ?
The second question I have has to do with something I read regarding the whole-archive option but couldn't quite parse. Something to the effect that --whole-archive option should be used while linking with a static library if the executable also links with a shared library which in turn has (in part) the same object code as the static library. That is the shared library and the static library have overlap in terms of object code. Using this option would force all symbols(regardless of use) to be resolved in the executable. This is supposed to avoid object code duplication. This is confusing, if a symbol is refereed in the program it must be resolved uniquely at link time, what is this business about duplication ? (Forgive me if this paragraph is not quite the epitome of clarity)
Thanks
There are legitimate uses of --whole-archive when linking executable with static libraries. One example is building C++ code, where global instances "register" themselves in their constructors (warning: untested code):
handlers.h
typedef void (*handler)(const char *data);
void register_handler(const char *protocol, handler h);
handler get_handler(const char *protocol);
handlers.cc (part of libhandlers.a)
typedef map<const char*, handler> HandlerMap;
HandlerMap m;
void register_handler(const char *protocol, handler h) {
m[protocol] = h;
}
handler get_handler(const char *protocol) {
HandlerMap::iterator it = m.find(protocol);
if (it == m.end()) return nullptr;
return it->second;
}
http.cc (part of libhttp.a)
#include <handlers.h>
class HttpHandler {
HttpHandler() { register_handler("http", &handle_http); }
static void handle_http(const char *) { /* whatever */ }
};
HttpHandler h; // registers itself with main!
main.cc
#include <handlers.h>
int main(int argc, char *argv[])
{
for (int i = 1; i < argc-1; i+= 2) {
handler h = get_handler(argv[i]);
if (h != nullptr) h(argv[i+1]);
}
}
Note that there are no symbols in http.cc that main.cc needs. If you link this as
g++ main.cc -lhttp -lhandlers
you will not get an http handler linked into the main executable, and will not be able to call handle_http(). Contrast this with what happens when you link as:
g++ main.cc -Wl,--whole-archive -lhttp -Wl,--no-whole-archive -lhandlers
The same "self registration" style is also possible in plain-C, e.g. with the __attribute__((constructor)) GNU extension.
Another legitimate use for --whole-archive is for toolkit developers to distribute libraries containing multiple features in a single static library. In this case, the provider has no idea what parts of the library will be used by the consumer and therefore must include everything.
An additional good scenario in which --whole-archive is well-used is when dealing with static libraries and incremental linking.
Let us suppose that:
libA implements the a() and b() functions.
Some portion of the program has to be linked against libA only, e.g. due to some function wrapping using --wrap (a classical example is malloc)
libC implements the c() functions and uses a()
the final program uses a() and c()
Incremental linking steps could be:
ld -r -o step1.o module1.o --wrap malloc --whole-archive -lA
ld -r -o step2.o step1.o module2.o --whole-archive -lC
cc step3.o module3.o -o program
Failing to insert --whole-archive would strip function c() which is anyhow used by program, preventing the correct compilation process.
Of course, this is a particular corner case in which incremental linking must be done to avoid wrapping all calls to malloc in all modules, but is a case which is successfully supported by --whole-archive.
I agree that using —whole-archive to build executables is probably not what you want (due to linking in unneeded code and creating bloated software). If they had a good reason to do so they should have documented it in the build system, as now you are left to guessing.
As to your second part of the question. If an executable links both a static library and a dynamic library that has (in part) the same object code as the static library then the —whole-archive will ensure that at link time the code from the static library is preferred. This is usually what you want when you do static linking.
Old query, but on your first question ("Why"), I've seen --whole-archive used for in-house libraries as well, primarily to sidestep circular references between those libraries. It tends to hide poor architecture of the libraries, so I'd not recommend it. However it's a fast way of getting a quick trial working.
For your second query, if the same symbol was present in a shared object and a static library, the linker will satisfy the reference with whichever library it meets first.
If the shared library and static library have an exact sharing of code, this may all just work. But where the shared library and the static library have different implementations of the same symbols, your program will still compile but will behave differently based on the order of libraries.
Forcing all symbols to be loaded from the static library is one way of removing confusion as to what is loaded from where. But in general this sounds like solving the wrong problem; you mostly won't want the same symbols in different libraries.

Resources