deterministic mode in ranlib in gnu utilities - unix

i was reading about ranlib that update the index or generates an index of contents of an archive
here
in the option that you can provide to ranlib there is -D
and -U
i read the definition but i could not understand it :
this is what they say :
-D
Operate in deterministic mode. The symbol map archive member’s header will show zero for the UID, GID, and timestamp. When this option is used, multiple runs will produce identical output files.
If binutils was configured with --enable-deterministic-archives,
can any one provide a simple explanation of this two option to ranlib
(-D and -U)
and why some one need to use this option ?

There is an effort ongoing in many distributions to make all builds of software from source to binary "deterministic", which means in this context that no matter who performs the build or when they do it, the binary you get out will be byte-for-byte identical to anyone else's build.
The goal is to allow verification of binaries via checksums, for verifying signatures, etc.
Needless to say this is a huge amount of work across many tools, and assumes you're using a predefined version of the compiler, runtime libraries, etc.
The POSIX archive library format (format for libfoo.a files) is basically a collection of object files, plus a table of contents. The table of contents by default contains timestamps, user ID, and group ID for each object file. Clearly preserving this information in the libfoo.a file makes it non-deterministic and thus not byte-for-byte identical.
So, for people who care about deterministic builds, they should use the -D option which writes 0 into those fields instead of the real values. For people who don't care about deterministic builds, they should use the -U option which uses the real values.
Be aware that if you use the -D option with ranlib you'll break make's library updating feature, which relies on examining the timestamps of object files from inside the library archive.

Related

How to obfuscate lua code?

I can't find anything on Google for some tool that encrypts/obfuscates my lua files, so I decided to ask here. Maybe some professional knows how to do it? (For free).
I have made a simple game in lua now and I don't want people to see the code, otherwise they can easily cheat. How can I make the whole text inside the .lua file to just random letters and stuff?
I used to program in C# and I had this .NET obfuscator called SmartAssembly which works pretty good. When someone would try check the code of my applications it would just be a bunch of letters and numbers together with chinese characters and stuff.
Anyone knows any program that can do this for lua aswell? Just load what file to encrypt, click Encrypt or soemthing, and bam! It works!?
For example this:
print('Hello world!')
would turn into something like
sdf9sd###&/sdfsdd9fd0f0fsf/&
Just precompile your files (chunks) and load binary chunks. luacallows you to strip debugging info. If that is not enough, define your own transformations on the compiled lua, stripping names where possible. There's not really so much demand for lua obfuscators though...
Also, you loose one of the main advantages of using an embedded scripting language: Extensibility.
The simplest obfuscation option is to compile your Lua code as others suggested, however it has two major issues: (1) the strings are still likely to be easily visible in your compiled code, and (2) the compiled code for Lua interpreter is not portable, so if you target different architectures, you need to have different compiled chunks for them.
The first issue can be addressed by using a pre-processor that (for example) converts your strings to a sequence of numbers and then concatenates them back at run-time.
The second issue is not easily addressed without changes to the interpreter, but if you have a choice of interpreters, then LuaJIT generates portable bytecode that will run across all its platforms (running the same version of LuaJIT); note that LuaJIT bytecode is different from Lua bytecode, so it can't be run by a Lua interpreter.
A more complex option would be to encrypt the code (possibly before compiling it), but you need to weight any additional mechanisms (and work on your part) against any possible inconvenience for your users and any loss you have from someone cracking the protection. I'd personally use something sufficiently simple to deter the majority of curious users as you likely stand no chance against a dedicated hacker anyway.
You could use loadstring to get a chunk then string.dump and then apply some transformations like cycling the bytes, swapping segments, etc. Transformations must be reversible. Then save to a file.
Note that anyone having access to your "encryptor" Lua module will know how to decrypt your file. If you make your encrypted module in C/C++, anyone with access to source will too, or to binary of Lua encryption module they could require the module too and unofuscate the code. With interpreted language it is quite difficult to do: you can raise the bar a bit via the above the techniques but raising it to require a significant amount of work (the onlybreal deterent) is very difficult AFAIK.
If you embed the Lua interpreter than you can do this from C, this makes it significantly harder (assuming a Release build with all symbols stripped), person would have to be comfortable with stepping through assembly but it only takes one capable person to crack the algorithm then they can make the code available to others.
Yo still interested in doing this? :)
I thought I'd add some example code, since the answers here were helpful, but didn't get us all the way there. We wanted to save some lua table information, and just not make it super easy for someone to inject their own code. serialize your table, and then use load(str) to make it into a loadable lua chunk, and save with string.dump. With the 'true' parameter, debug information is stripped, and there's really not much there. Yes you can see string keys, but it's much better than just saving the naked serialized lua table.
function tftp.SaveToMSI( tbl, msiPath )
assert(type(tbl) == "table")
assert(type(msiPath) == "string")
local localName = _GetFileNameFromPath( msiPath )
local file,err = io.open(localName, "wb")
assert(file, err)
-- convert the table into a string
local str = serializer.Serialize( tbl )
-- create a lua chunk from the string. this allows some amount of
-- obfuscation, because it looks like gobblygook in a text editor
local chunk = string.dump(load(str), true)
file:write(chunk)
file:close()
-- send from /usr to the MSI folder
local sendResult = tftp.SendFile( localName, msiPath )
-- remove from the /usr folder
os.remove(localName)
return sendResult
end
The output from one small table looks like this in Notepad++ :
LuaS У
Vx#w( # АKА└АJБ┴ JА #
& А &  name
Coulombmetervalue?С╘ ажў

Build system supports multiple output per target

I work in the field of bioinformatics. My daily work processes several data files (DNA sequences, alignments, etc..) and produce many result files, so I want to use something like Unix make to automate the whole process, especially to resolve dependencies between different data.
However, the Unix make only supports one output per target, as it is designed for software build, which typically generates one object file from several source files, or one executable from several object files. If you use custom virtual targets, it won't benefit from timestamp checking. Is there any build system that supports multiple output file per one target? If there aren't any, I'm going to make the wheel.
Have a look at Drake, which is a replacement of make designed for data workflow management (make for data).
Another option is makepp, which is an improved make. Among other features it supports multiple targets.

Change in Binary without change in Source Code

I have the following requirement: To Find if my binary has changed or not.
My source code is unchanged. When I recompile the binary (without change in Source Code), I notice that the Binary is changed. Not in Size, but in Contents.
On debugging a little, I found there is something called "Link Time" inside the binary file. This is the actual timestamp when the binary was linked. Now since each compile will give different timestamps, hence my binary contents are always different. But actually it should be the same.
Can somebody suggest me a way of finding out if the binary has actually changed due to change in source code, and not anything else.
Thanks
Unlike on Windows (where every .obj file has a compile timestamp in its file header), UNIX object files, and in particular ELF files do not encode any kind of timestamp.
However, if your source uses __TIME__ and __DATE__ macros, then the object file produced by compilation will obviously change. Also, all kinds of information, including compilation timestamp could be recorded as part of the debug info, if you are building -g binaries.
Finally, it's possible that the linker you are using does record the link timestamp (as a vendor extension).
Your fist task should be to understand where the differences from one build to the next come from.
If from __DATE__ and __TIME__, eliminate them from your source.
If from debug info, compare the binaries after passing them through strip -g.
If from vendor linker extension, see if there is a flag to disable such timestamps. If there isn't one, you'll have to write a tool that compares only the parts you are interested in. E.g. you could use readelf -x.text a.out, etc. to compare only the .text section (you'll also want to compare .data, .rodata, and likely many others).

How can I quickly search my code using unix?

I frequently search for a token, perhaps a function name, throughout my codebase. My traditional method would be to grep for the term itself. However, the codebase is so large that I can't do this efficiently (it takes minutes).
Is there a way to do this efficiently?
ack (which ignores irrelevent files such as revision control files) is still too slow. ctags only finds declarations, which isn't what I need. I thought something like strigi might work, but I haven't tried it.
I'm on linux, using vim and the GNU toolchain, on a largely C++ codebase.
Use find and fgrep. A decent find will restrict the set to files matching some set of criteria, and fgrep will search within each file. Note that fgrep will be faster than grep.
You could also use an IDE's code indexing features. For example, Eclipse will allow you to go to a function's declaration, and it builds the relevant set in the background while you work. Even vim 7 has some such features, although I don't remember exactly what it does beyond code completion.

Log parser/analyzer in Unix

What's the popular tool people use in Unix to parse/analyze log files? Doing counting, find unique, select/copy certain line which have certain patterns. Please advise some tools or some keyword. Since I believe there must be similar questions asked before, but I don't any idea about the keywords. Thanks.
I find it to be a huge failure that many log formats do not separate columns with proper unique field separators. Not because that is best, but because it is the basic premise of unix textutils that operate on table data. Instead they tend to use spaces as separators and quote fields that might contain spaces.
One of the most practical simple changes I made to web log analyzing was to leave the default NCSA log format produced by the nginx web server, to instead use tab as the field separator.
Suddenly I could use all of the primitive unix textutils for quick lookups, but especially awk! Print only lines where the user-agent field contains Googlebot:
awk 'BEGIN {FS="\t"} $7 ~ /Googlebot/ { print; }' < logfile
Find the number of requests on for each unique request
awk 'BEGIN {FS="\t"} { print $4; }' < logfile | sort | uniq -c | sort -n
And of course lots of combinations to find specific visitors.
For regular, nightly checking there is logwatch which have several different scripts in /usr/share/logwatch/scripts/services that check for specific things (like web server stuff, ftp server stuff, sshd related stuff, etc) in syslog. Default install enables most of them, but you are able to enable/disable as you like or even write your own scripts.
For real-time watching there is multitail.
You might want to try out lnav, a curses based log analyzer. It has most of the features you would expect from a log parser like, chronological arrangement of log messages from multiple log files, support for multiple log formats, highlighting of error/warning messages, hotkeys for navigating between error/warning messages, support for SQL queries and lots more. Take a look at the project's website for screenshots and a detailed list of features.
Take a look at some of the generic log parsers listed here. If you use something like syslog, you can probably get a custom parser/analyzer too. Otherwise, for trivial searches, any scripting language like perl, python or even awk suffices.
Any programming language that allows you to open and read files, do string/text manipulations can be used, eg Perl,Python,(g)awk, Ruby,PHP, even Java etc. They support modules for the file formats you are parsing,eg csv, etc.

Resources