How to obfuscate lua code? - encryption

I can't find anything on Google for some tool that encrypts/obfuscates my lua files, so I decided to ask here. Maybe some professional knows how to do it? (For free).
I have made a simple game in lua now and I don't want people to see the code, otherwise they can easily cheat. How can I make the whole text inside the .lua file to just random letters and stuff?
I used to program in C# and I had this .NET obfuscator called SmartAssembly which works pretty good. When someone would try check the code of my applications it would just be a bunch of letters and numbers together with chinese characters and stuff.
Anyone knows any program that can do this for lua aswell? Just load what file to encrypt, click Encrypt or soemthing, and bam! It works!?
For example this:
print('Hello world!')
would turn into something like
sdf9sd###&/sdfsdd9fd0f0fsf/&

Just precompile your files (chunks) and load binary chunks. luacallows you to strip debugging info. If that is not enough, define your own transformations on the compiled lua, stripping names where possible. There's not really so much demand for lua obfuscators though...
Also, you loose one of the main advantages of using an embedded scripting language: Extensibility.

The simplest obfuscation option is to compile your Lua code as others suggested, however it has two major issues: (1) the strings are still likely to be easily visible in your compiled code, and (2) the compiled code for Lua interpreter is not portable, so if you target different architectures, you need to have different compiled chunks for them.
The first issue can be addressed by using a pre-processor that (for example) converts your strings to a sequence of numbers and then concatenates them back at run-time.
The second issue is not easily addressed without changes to the interpreter, but if you have a choice of interpreters, then LuaJIT generates portable bytecode that will run across all its platforms (running the same version of LuaJIT); note that LuaJIT bytecode is different from Lua bytecode, so it can't be run by a Lua interpreter.
A more complex option would be to encrypt the code (possibly before compiling it), but you need to weight any additional mechanisms (and work on your part) against any possible inconvenience for your users and any loss you have from someone cracking the protection. I'd personally use something sufficiently simple to deter the majority of curious users as you likely stand no chance against a dedicated hacker anyway.

You could use loadstring to get a chunk then string.dump and then apply some transformations like cycling the bytes, swapping segments, etc. Transformations must be reversible. Then save to a file.
Note that anyone having access to your "encryptor" Lua module will know how to decrypt your file. If you make your encrypted module in C/C++, anyone with access to source will too, or to binary of Lua encryption module they could require the module too and unofuscate the code. With interpreted language it is quite difficult to do: you can raise the bar a bit via the above the techniques but raising it to require a significant amount of work (the onlybreal deterent) is very difficult AFAIK.
If you embed the Lua interpreter than you can do this from C, this makes it significantly harder (assuming a Release build with all symbols stripped), person would have to be comfortable with stepping through assembly but it only takes one capable person to crack the algorithm then they can make the code available to others.
Yo still interested in doing this? :)

I thought I'd add some example code, since the answers here were helpful, but didn't get us all the way there. We wanted to save some lua table information, and just not make it super easy for someone to inject their own code. serialize your table, and then use load(str) to make it into a loadable lua chunk, and save with string.dump. With the 'true' parameter, debug information is stripped, and there's really not much there. Yes you can see string keys, but it's much better than just saving the naked serialized lua table.
function tftp.SaveToMSI( tbl, msiPath )
assert(type(tbl) == "table")
assert(type(msiPath) == "string")
local localName = _GetFileNameFromPath( msiPath )
local file,err = io.open(localName, "wb")
assert(file, err)
-- convert the table into a string
local str = serializer.Serialize( tbl )
-- create a lua chunk from the string. this allows some amount of
-- obfuscation, because it looks like gobblygook in a text editor
local chunk = string.dump(load(str), true)
file:write(chunk)
file:close()
-- send from /usr to the MSI folder
local sendResult = tftp.SendFile( localName, msiPath )
-- remove from the /usr folder
os.remove(localName)
return sendResult
end
The output from one small table looks like this in Notepad++ :
LuaS У
Vx#w( # АKА└АJБ┴ JА #
& А &  name
Coulombmetervalue?С╘ ажў

Related

Closure: --namespace Foo does not include Foo.Bar, and related issues

I have a rather big library with a significant set of APIs that I need to expose. In fact, I'd like to expose the whole thing. There is a lot of namespacing going on, like:
FooLibrary.Bar
FooLibrary.Qux.Rumps
FooLibrary.Qux.Scrooge
..
Basically, what I would like to do is make sure that the user can access that whole namespace. I have had a whole bunch of trouble with this, and I'm totally new to closure, so I thought I'd ask for some input.
First, I need closurebuilder.py to send the full list of files to the closure compiler. This doesn't seem supported: --namespace Foo does not include Foo.Bar. --input only allows a single file, not a directory. Nor can I simply send my list of files to the closure compiler directly, because my code is also requiring things like "goog.assers", so I do need the resolver.
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
This is my main issue.
However, later the closure compiler, with ADVANCED_OPTIMIZATIONS on, will optimize all these names away. Now I can fix that by adding "#export" all over the place, which I am not happy about, but should work. I suppose it would also be valid to use an extern here. Or I could simply disable advanced optimizations.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
Finally, for working in source mode, I need to do goog.require() for every namespace I am using. This is merely an inconvenience, though I am mentioning because it sort of related to my trouble above. I would prefer to be able to do:
goog.requireRecursively('FooLibrary')
in order to pull all the child namespaces as well; thus, recreating with a single command the environment that I have when I am using the compiled version of my library.
I feel like I am possibly misunderstanding some things, or how Closure is supposed to be used. I'd be interested in looking at other Closure-based libraries to see how they solve this.
You are discovering that Closure-compiler is built more for the end consumer and not as much for the library author.
If you are exporting basically everything, then you would be better off with SIMPLE_OPTIMIZATIONS. I would still highly encourage you to maintain compatibility of your library with ADVANCED_OPTIMIZATIONS so that users can compile the library source with their project.
First, I need closurebuilder.py to send the full list of files to the closure compiler. ...
In fact, the only solution I can see is having a FooLibrary.ExposeAPI JS file that #require's everything. Surely that can't be right?
You would need to specify an --root of your source folder and specify the namespaces of the leaf nodes of your file dependency tree. You may have better luck with the now deprecated CalcDeps.py script. I still use it for some projects.
What I can't do, apparently, is say "export FooLibrary.*". Wouldn't that make sense?
You can't do that because it only makes sense based on the final usage. You as the library writer wish to export everything, but perhaps a consumer of your library wishes to include the source (uncompiled) version and have more dead code elimination. Library authors are stuck in a kind of middle ground between SIMPLE and ADVANCED optimization levels.
What I have done for this case is maintain a separate exports file for my namespace that exports everything. When compiling a standalone version of my library for distribution, the exports file is included in the compilation. However I can still include the library source (without the exports) into a project and get full dead code elimination. The work/payoff balance of this though must be weighed against just using SIMPLE_OPTIMIZATIONS for the standalone library.
My GeolocationMarker library has an example of this strategy.

Supporting long unicode filepaths with System.Data.SQLite

I'm developing an application that needs to be able to create & manipulate SQLite databases in user-defined paths. I'm running into a problem I don't really understand. I'm testing my stuff against some really gross sample data with huge unwieldy unicode paths, for most of them there isn't a problem, but for one there is.
An example of a working connection string is:
Data Source="c:\test6\意外な高価で売れるかも? 出品は手順を覚えれば後はかんたん!\11オークションストアの出品は対象外とさせていただきます。\test.db";Version=3;
While one that fails is
Data Source="c:\test6\意外な高価で売れるかも? 出品は手順を覚えれば後はかんたん!\22今やPCライフに欠かせないのがセキュリティソフト。そのため、現在何種類も発売されているが、それぞれ似\test.db";Version=3;
I'm using System.Data.SQLite v1.0.66.0 due to reasons outside of my control, but I quickly tested with the latest, v1.0.77.0 and had the same problems.
Both when attempting to newly create the test.db file or if I manually put one there and it's attempting to open, SQLiteConnection.Open throws an exception saying only "Unable to open the database file", with the stack trace showing that it's actually System.Data.SQLite.SQLite3.Open that is throwing.
Is there any way I can get System.Data.SQLite to play nicely with these paths? A workaround could be to create and manipulate my databases in a temporary location and then just move them to the actual locations for storage, since I can create and manipulate files normally otherwise. That's kind of a last resort though.
Thank you.
I am guessing you are on a Japanese-locale machine where the default system encoding (ANSI code page) is cp932 Japanese (≈Shift-JIS).
The second path contains:
ソ
which encodes to the byte sequence:
0x83 0x5C
Shift-JIS is a multibyte encoding that has the unfortunate property of sometimes re-using ASCII code units in the trail byte. In this case it has used byte 0x5C which corresponds to the backslash \. (Though this typically displays as a yen sign in Japanese fonts, for historical reasons.)
So if this pathname is passed into a byte-based API, it will get encoded in the ANSI code page, and you won't be able to tell the difference between a backslash meant as a directory separator and one that is a side-effect of multi-byte encoding. Consequently any path with one of the following characters in will fail when accessed with a byte-based IO method:
―ソЫⅨ噂浬欺圭構蚕十申曾箪貼能表暴予禄兔喀媾彌拿杤歃畚秉綵臀藹觸軆鐔饅鷭偆砡纊犾
(Also any pathname that contains a Unicode character not present in cp932 will naturally fail.)
It would appear that behind the scenes SQLite is using a byte-based IO method to open the filename it is given. This is unfortunate, but extremely common in cross-platform code, because the POSIX C standard library is defined to use byte-based filenames for operations like file open().
Consequently using the C stdlib functions it is impossible to reliably access files with non-ASCII names. This sad situation inherits into all sorts of cross-platform libraries and languages written using the stdlib; only tools written with specific support for Win32 Unicode filenames (eg Python) can reliably access all files under Windows.
Your options, then, are:
avoid using non-ASCII characters in the path name for your db, as per the move/rename suggestion;
continue to rely on the system locale being Japanese (ANSI code page=932), and just rename files to avoid any of the characters listed above;
get the short (8.3) filename of the file in question and use that instead of the real one—something like c:\test6\85D0~1\22PC~1\test.db. You can use dir /x to see the short-filenames. They are always pure ASCII, avoiding the encoding problem;
add some code to get the short filename from the real one, using GetShortPathName. This is a Win32 API so you need a little help to call it from .NET. Note also short filenames will still fail if run on a machine with the short filename generation feature disabled;
persuade SQLite to add support for Windows Unicode filenames;
persuade Microsoft to fix this problem once and for all by making the default encoding for byte interfaces UTF-8, like it is on all other modern operating systems.

Store map key/values in a persistent file

I will be creating a structure more or less of the form:
type FileState struct {
LastModified int64
Hash string
Path string
}
I want to write these values to a file and read them in on subsequent calls. My initial plan is to read them into a map and lookup values (Hash and LastModified) using the key (Path). Is there a slick way of doing this in Go?
If not, what file format can you recommend? I have read about and experimented with with some key/value file stores in previous projects, but not using Go. Right now, my requirements are probably fairly simple so a big database server system would be overkill. I just want something I can write to and read from quickly, easily, and portably (Windows, Mac, Linux). Because I have to deploy on multiple platforms I am trying to keep my non-go dependencies to a minimum.
I've considered XML, CSV, JSON. I've briefly looked at the gob package in Go and noticed a BSON package on the Go package dashboard, but I'm not sure if those apply.
My primary goal here is to get up and running quickly, which means the least amount of code I need to write along with ease of deployment.
As long as your entiere data fits in memory, you should't have a problem. Using an in-memory map and writing snapshots to disk regularly (e.g. by using the gob package) is a good idea. The Practical Go Programming talk by Andrew Gerrand uses this technique.
If you need to access those files with different programs, using a popular encoding like json or csv is probably a good idea. If you just have to access those file from within Go, I would use the excellent gob package, which has a lot of nice features.
As soon as your data becomes bigger, it's not a good idea to always write the whole database to disk on every change. Also, your data might not fit into the RAM anymore. In that case, you might want to take a look at the leveldb key-value database package by Nigel Tao, another Go developer. It's currently under active development (but not yet usable), but it will also offer some advanced features like transactions and automatic compression. Also, the read/write throughput should be quite good because of the leveldb design.
There's an ordered, key-value persistence library for the go that I wrote called gkvlite -
https://github.com/steveyen/gkvlite
JSON is very simple but makes bigger files because of the repeated variable names. XML has no advantage. You should go with CSV, which is really simple too. Your program will make less than one page.
But it depends, in fact, upon your modifications. If you make a lot of modifications and must have them stored synchronously on disk, you may need something a little more complex that a single file. If your map is mainly read-only or if you can afford to dump it on file rarely (not every second) a single csv file along an in-memory map will keep things simple and efficient.
BTW, use the csv package of go to do this.

finding duplicate source code

I'm analyzing some legacy code. It is about 80.000 lines of old plsql code. On a fist look there is quite some duplication in the source which needs to be removed. Instead off doing diff's manual and looking at each file there must be some tool/commandline confu out there to detect duplicate lines of source code.
My goal is to make an educated guess about the minimal size of a rewrite of source and about how much actual knowledge is captured in this program. I wrote some a basic static code analyzer to find the amount of control statements IF ELSE FOR etc and Functions in each file.
But duplicated code still needs to be removed from my statistics.
Have you looked at Simian - Similarity Analyser? (Just checked and it's no longer free, but it is available for a period of 15 days for evaluation purposes.)
Simian (Similarity Analyser)
identifies duplication in Java, C#, C,
C++, COBOL, Ruby, JSP, ASP, HTML, XML,
Visual Basic, Groovy source code and
even plain text files. In fact, simian
can be used on any human readable
files such as ini files, deployment
descriptors, you name it.
I have used it in practice and it does work well.
Sonar has duplication detection and claims to support PL/SQL, though I've never used it for that.
You would need to beg/borrow/steal/write a plsql parser and compare the resulting abstract syntax trees. With the size of the code base you have, that might be worthwhile. There would be other uses for the parser once you're done.
How about this:
http://sourceforge.net/projects/sddforeclipse/
It is opensource, and is said to be used by commercial software. It is a plugin to Eclipse, by the way.

How can I quickly search my code using unix?

I frequently search for a token, perhaps a function name, throughout my codebase. My traditional method would be to grep for the term itself. However, the codebase is so large that I can't do this efficiently (it takes minutes).
Is there a way to do this efficiently?
ack (which ignores irrelevent files such as revision control files) is still too slow. ctags only finds declarations, which isn't what I need. I thought something like strigi might work, but I haven't tried it.
I'm on linux, using vim and the GNU toolchain, on a largely C++ codebase.
Use find and fgrep. A decent find will restrict the set to files matching some set of criteria, and fgrep will search within each file. Note that fgrep will be faster than grep.
You could also use an IDE's code indexing features. For example, Eclipse will allow you to go to a function's declaration, and it builds the relevant set in the background while you work. Even vim 7 has some such features, although I don't remember exactly what it does beyond code completion.

Resources