Change in Binary without change in Source Code - unix

I have the following requirement: To Find if my binary has changed or not.
My source code is unchanged. When I recompile the binary (without change in Source Code), I notice that the Binary is changed. Not in Size, but in Contents.
On debugging a little, I found there is something called "Link Time" inside the binary file. This is the actual timestamp when the binary was linked. Now since each compile will give different timestamps, hence my binary contents are always different. But actually it should be the same.
Can somebody suggest me a way of finding out if the binary has actually changed due to change in source code, and not anything else.
Thanks

Unlike on Windows (where every .obj file has a compile timestamp in its file header), UNIX object files, and in particular ELF files do not encode any kind of timestamp.
However, if your source uses __TIME__ and __DATE__ macros, then the object file produced by compilation will obviously change. Also, all kinds of information, including compilation timestamp could be recorded as part of the debug info, if you are building -g binaries.
Finally, it's possible that the linker you are using does record the link timestamp (as a vendor extension).
Your fist task should be to understand where the differences from one build to the next come from.
If from __DATE__ and __TIME__, eliminate them from your source.
If from debug info, compare the binaries after passing them through strip -g.
If from vendor linker extension, see if there is a flag to disable such timestamps. If there isn't one, you'll have to write a tool that compares only the parts you are interested in. E.g. you could use readelf -x.text a.out, etc. to compare only the .text section (you'll also want to compare .data, .rodata, and likely many others).

Related

deterministic mode in ranlib in gnu utilities

i was reading about ranlib that update the index or generates an index of contents of an archive
here
in the option that you can provide to ranlib there is -D
and -U
i read the definition but i could not understand it :
this is what they say :
-D
Operate in deterministic mode. The symbol map archive member’s header will show zero for the UID, GID, and timestamp. When this option is used, multiple runs will produce identical output files.
If binutils was configured with --enable-deterministic-archives,
can any one provide a simple explanation of this two option to ranlib
(-D and -U)
and why some one need to use this option ?
There is an effort ongoing in many distributions to make all builds of software from source to binary "deterministic", which means in this context that no matter who performs the build or when they do it, the binary you get out will be byte-for-byte identical to anyone else's build.
The goal is to allow verification of binaries via checksums, for verifying signatures, etc.
Needless to say this is a huge amount of work across many tools, and assumes you're using a predefined version of the compiler, runtime libraries, etc.
The POSIX archive library format (format for libfoo.a files) is basically a collection of object files, plus a table of contents. The table of contents by default contains timestamps, user ID, and group ID for each object file. Clearly preserving this information in the libfoo.a file makes it non-deterministic and thus not byte-for-byte identical.
So, for people who care about deterministic builds, they should use the -D option which writes 0 into those fields instead of the real values. For people who don't care about deterministic builds, they should use the -U option which uses the real values.
Be aware that if you use the -D option with ranlib you'll break make's library updating feature, which relies on examining the timestamps of object files from inside the library archive.

Can we hide/obscure symbol names in the symbol table of ELF executable object file?

According to this ELF specification: ELF object file contains various sections and one of them is symbol table section .symtab which contains information of all symbols (files, functions, objects etc).
ELF contains information like name, attribute flags, type, value and binding etc for each symbol in the symbol table.
The name of an object for a file, function or object (array, variable, string) etc. actually exposes the internal information of the code. This way any person can analyze an ELF (using strings, objdump or readelf tools) and see this information and get idea of things internal to the code which should be kept secret.
For readability and maintainability we write code which can be understood by the developers. So, we need to keep using proper file names and variable names etc. We cannot obscure them using code obfuscation as it will make it difficult to maintain.
Question (edited): Is there any way by which we can hide or remove symbol "names" from the symbol table of executable ELF so that no one can get the insight of code and the executable is still operational?
Is there any way by which we can hide or obscure symbol names in the symbol table of ELF so that no one can get the insight of how code is developed (without code obfuscation)?
Depends on what kind of ELF file you are shipping to the end user.
If you are shipping a fully-linked ELF executable, running strip a.out will remove symbol table completely (but not the dynamic symbol, which must remain for obvious reasons).
If you are shipping an ELF shared library, you need to carefully control its exposed API using -fvisibility=hidden or a linker version script. If you do, strip will again remove everything except your public API.
If you are shipping a relocatable ELF object (or an archive library), then you can't do anything about its symbol table (again for obvious reason: symbol table is used to perform the final link).
Finally, your question appears to be predicated on misconception:
We cannot obscure them using code obfuscation as it will make it difficult to maintain.
The usual way to apply code obfuscation is just before you make the final shipping product (i.e. at exactly the same point where you would use strip, or any other method that would hide implementation details). Applying code obfuscation at that point will make the result difficult to maintain in exactly the same way as any other method of hiding implementation details.
Notably, you don't (usually) apply obfuscation to the code under development and maintenance (i.e. your development builds remain un-obfuscated).
Yes, it is possible. You can use strip to remove static library symbols and you can remove dynamic library symbols by loading the library yourself instead of letting the OS do this automatically.

Unix touch command usage

I know you can use touch to create a new empty file.
I just learned that touch can be used to update the access and modification time of a file. I don't quite know in what situations and why do you need to update the access and modification time of a file , i.e. the usefulness of this particular function?
Thanks!
Some utility depends on timestamp of the file.
For example, make uses timestamp to check whether it is required to do something (usually build) based on the timestamp of the source code, and output (executable, object files, ...)
By touching followed by make, the source file, you can force rebuild.
In addition, touch has a -d option that can fake the modification time.
If one "knows what he's doing" she can avoid long build time, due to unnecessary re-compilations.
For example, when adding a declaration to a common header file,
that does not change any old API, one can fake the header real modification time,
and bypass Makefile's dependencies.

Under what conditions on Unix can gtk_file_chooser_get_filename() return NULL signifying a non-local filename?

From the documentation for gtk_file_chooser_get_filename():
The currently selected filename, or NULL if no file is selected, or the selected file can't be represented with a local filename. Free with g_free().
Is there at least one situation where the bolded condition is true on a Unix system (Linux, the various BSDs, etc.)? I tried reading through the source code but got lost/confused. I'd like to know so I can determine if I need to handle it in some special way; I don't need to know every possibility for this.
Thanks.
I haven't yet read through the source either, but I would guess that gtk_file_chooser_get_filename() essentially returns g_file_get_path (gtk_file_chooser_get_file (...)). Probably the only case in which you would need to care about the filename being NULL is if your file chooser is enabled to pick files from a network share, for example. It's probably not something you need to worry about if you set the local-only property on your file chooser.
However, it's probably good practice to use gtk_file_chooser_get_file() anyway, since you will transparently be able to handle non-local files if you have the proper GVFS modules installed.

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

Resources