Methods to discourage reverse engineering of an opencl kernel - opencl

I am preparing my opencl accelerated toolkit for release. Currently, I compile my opencl kernels into binaries targeted for a particular card.
Are there other ways of discouraging reverse engineering? Right now, I have many opencl binaries in my release folder, one for each kernel. Would it be better to splice these binaries into one single binary, or even add them into the host binary, and somehow read them in using a special offset ?

OpenCL 2.0 and SPIR-V can be used for this, but is not available on all platforms yet.

Encode binaries. Keep keys in server and have clients request it at time of usage. Ofcourse keys should be encoded too,( using a variable value such as time of server maybe). Then decode in client to use as binary kernel.
I'm not encode pro but I would use multiple algorithms applied multiple times to make it harder, if they are crunchable in several months(needed for new version update of your GPGPU software for example) when they are alone. But simple unknown algorithm of your own such as reversing order of bits of all data (1st bit goes nth position, nth goes 1st) should make it look hard for level-1 hackers.
Warning: some profiling tools could get its codes in "run-time" so you should add many maybe hundreds of trivial kernels without performance penalty to hide it in a crowded timeline or you could disable profiling in kernel options or you could add a deliberate error maybe some broken events in queues then restart so profiler cannot initiate.
Maybe you could obfuscate final C99 code so it becomes unreadable by humans. If can, he/she doesn't need hacking in first place.
Maybe most effectively, don't do anything, just buy copyrights of your genuine algorithm and show it in a txt so they can look but can not dare copying for money.
If kernel can be rewritten into an "interpreted" version without performance penalty, you can get bytecodes from server to client, so when client person checks profiler, he/she sees only interpreter codes but not real working algorithm since it depends on bytecodes from server as being "data"(buffer). For example, c=a+b becomes if(..)else if(...)else(case ...) and has no meaning without a data feed.
On top of all these, you could buy time against some evil people reverseengineer it, you could pick variable names to initiate his/her "selective perception" so he/she can't focus for a while. Then you develop a meaner version at the same time. Such as c=a+b becomes bulletsToDevilsEar=breakDevilsLeg+goodGame


How are we actually supposed to include our OpenCL code?

How are we actually supposed to include our OpenCL code in our C projects?
We can't possibly be supposed to ship our .cl files along with our executable for the executable to find them and load them at runtime, because that's stupid, right?
We can't be supposed to use some stringify macro because a) that's apparently not portable/leads to undefined behaviour and b) it all breaks down if you use commas not enclosed in brackets like when defining many variables of the same type, I've spent an hour here looking for a solution to that and there doesn't seem to be one that actually works and c) that's kind of stupid.
Are we expected to write our code into C string literals like "int x, y;\n" "float4 p;\n"? Because I'm not doing that. Are we supposed to do a C include-style hexdump of our .cl files? That seems inconvenient. What are we actually supposed to do?
It's bad enough that all these approaches basically mean that you have to ship your program with your OpenCL code essentially open sourced when your OpenCL code is probably the last thing you want open sourced, on top of it it seems every OpenCL project I've seen uses one of the approaches listed above, it just doesn't seem right at all, it's like the people who made OpenCL forgot about something.
This thread: OpenCL bytecode running on another card mentions SPIR, a "platform-portable intermediate representation for OpenCL device programs". Other than that, you are basically restrained to the options you already mentioned.
Personally, I began to use C++11 raw string literals to get rid of my nasty stringify-macros. Don't know if C++ is an option for you, however.
Concerning your rejection of the "ship our .cl files along with our executable" approach: I don't see why this is inherently stupid -- the CL "shaders" are an application resource like all other separate files beside the executable, and thus are part of the "application bundle". It's perfectly reasonable to have such kind of files, and each operating system has its way to deal with it (in win32, the program directory is the bundle , OSX has its own bundle concept, etc...).
Now, if you are worried about other people peeking into your OpenCL code, you can still apply some obfuscation methods (e.g. encrypt your .cl-files by a key which is more or less cleverly hidden in your executable).
[edit/sidenote]: We could also investigate how other companies deal with this issue in the context of, for example, OpenGL/Direct3D shaders. In my limited experience, gaming companies tend to dump their shaders in text form somewhere in their application directory, for all to see (and even to tamper with). So in the gaming world at least, there is no great deal of secrecy in that respect... Wonder what adobe or CAD software companies do in their professional software.

Use cases for self-modifying code?

On a Von Neumann architecture, program and data are both stored in memory, so a program can modify itself. Is this useful for a programmer? Could you give some examples?
One (questionable) use case that comes to my mind is metamorphic computer viruses. These are malicious pieces of software that conceal themselves from signature based detection by rewriting their own machine code to an semantically equivalent representation that looks different.
Another (more complex, but also more common) use case is trampolining, a technique based on dynamic code generation to solve certain problems with nested function calls.
JIT compilation
The most common usage of dynamic code generation that I can think of is JIT (just-in-time) compilation. Modern languages like .NET or Java are not compiled into native machine code, but into some kind of intermediate language (called bytecode). This bytecode is then interpreted when the program is executed (by a virtual machine written for the target architecture). At the same time, a background process checks which parts of the code are executed very often. These parts then have a good chance of being dynamically compiled into native machine language for maximum performance. All this happens during the run time of the program!
Security implications
One thing to keep in mind is that the possibility to interpret data as code is useful for exploiting security holes in computer software, which is why the trend in modern hardware and operating systems is to enable and, if possible, even enforce the separation of code and data (also see NX bit and DEP).
I can best answer this by referring you to an answer to a similar (exceptionally well written and answered) question, also on StackOverflow - Homoiconic and "unrestricted" self modifying code + Is lisp really self modifying?. The answer focuses on Lisp, a family languages known for taking "code is data" to the next level, and explores the uses of that in AI.

Language without explicit memory alloc/dealloc AND without garbage collection

I was wondering if it is possible to create a programming language without explicit memory allocation/deallocation (like C, C++ ...) AND without garbage collection (like Java, C#...) by doing a full analysis at the end of each scope?
The obvious problem is that this would take some time at the end of each scope, but I was wondering if it has become feasible with all the processing power and multiple cores in current CPU's. Do such languages exist already?
I also was wondering if a variant of C++ where smart pointers are the only pointers that can be used, would be exactly such a language (or am I missing some problems with that?).
Well after some more research apparently it's this:
I was wondering why this isn't more popular. The disadvantages listed there don't seem quite serious, the overhead should be that large according to me. A (non-interpreted, properly written from the ground up) language with C family syntax with reference counting seems like a good idea to me.
The biggest problem with reference counting is that it is not a complete solution and is not capable of collecting a cyclic structure. The overhead is incurred every time you set a reference; for many kinds of problems this adds up quickly and can be worse than just waiting for a GC later. (Modern GC is quite advanced and awesome - don't count it down like that!!!)
What you are talking about is nothing special, and it shows up all the time. The C or C++ variant you are looking for is just plain regular C or C++.
For example write your program normally, but constrain yourself not to use any dynamic memory allocation (no new, delete, malloc, or free, or any of their friends, and make sure your libraries do the same), then you have that kind of system. You figure out in advance how much memory you need for everything you could do, and declare that memory statically (either function level static variables, or global variables). The compiler takes care of all the accounting the normal way, nothing special happens at the end of each scope, and no extra computation is necessary.
You can even configure your runtime environment to have a statically allocated stack space (this one isn't really under the compiler's control, more linker and operating system environment). Just figure out how deep your function call chain goes, and how much memory it uses (with a profiler or similar tool), an set it in your link options.
Without dynamic memory allocation (and thus no deallocation through either garbage collection or explicit management), you are limited to the memory you declared when you wrote the program. But that's ok, many programs don't need dynamic memory, and are already written that way. The real need for this shows up in embedded and real-time systems when you absolutely, positively need to know exactly how long an operation will take, how much memory (and other resources) it will use, and that the running time and the use of those resources can't ever change.
The great thing about C and C++ is that the language requires so little from the environment, and gives you the tools to do so much, that smart pointers or statically allocated memory, or even some special scheme that you dream up can be implemented. Requiring the use them, and the constraints you put on yourself just becomes a policy decision. You can enforce that policy with code auditing (use scripts to scan the source or object files and don't permit linking to the dynamic memory libraries)

Real-time control of Windows Console game

another quick question, I want to make simple console based game, nothing too fancy, just to have some weekend project to get more familiar with C. Basically I want to make tetris, but I end up with one problem:
How to let the game engine go, and in the same time wait for input? Obviously cin or scanf is useless for me.
You're looking for a library such as ncurses.
Many Rogue-like games are written using ncurses or similar.
There's two ways to do it:
The first is to run two threads; one waits for input and updates state accordingly while the other runs the game.
The other (more common in game development) way is to write the game as one big loop that executes many times a second, updating game state, redrawing the screen, and checking for input.
But instead of blocking when you get key input, you check for the presence of pending keypresses, and if nothing has happened, you just continue through your loop. If you have multiple input sources (keyboard, network, etc.) they all get put there in the loop, checking one after another.
Yes, it's called polling. No, it's not efficient. But high-end games are usually all about pulling the maximum performance and framerates out of the computer, not running cool.
For added efficiency, you can optionally block with a timeout -- saying "wait for a keypress, but no longer than 300 milliseconds" so you can continue on with your loop.
select() comes to mind, but there are other ways of waiting or checking for input as well.
You could work out how to change stdin to non-blocking, which would enable you to write something like tetris, but the game might be more directly expressed in an event-driven paradigm. Maybe it's a good excuse to learn windows programming.
Anyway, if you want to go the console route, if you are using the microsoft compiler, then you should have kbhit() available (via conio.h) which can tell you whether a call to fgetc on stdin would block.
Actually should mention that the MinGW gcc compiler 3.4.5 also supports kbhit().

make your Jar not to be decompiled

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
Keep your solutions added we need this more.
