Google Native Client - How to protect the source code? - google-nativeclient

With Google Native Client, can the source code be protected so that, unlike JavaScript, it is not visible in the client?
If so, how? Thanks!

As the name says, Google Native Client uses native code.
That means, your code is compiled, just like with your average executable binary on the desktop. It can be disassembled, but the source code can't be recovered.

Native client means that you are running native code on the client. In most cases, you'll be running i386 or amd64 machine language on your client. If you're using a compiled language, then your users cannot directly recover it. Users could disassemble your software to recover some information about your code, but they cannot recover the original source code (unless it is assembly language). Rewriting a piece of software from the disassembled binary is difficult, but given enough time, it can usually be done. It really depends on how paranoid you are about the people using your code.

Native Client's structural requirements to enable reliable disassembly so that it can perform static analysis can make some techniques for code obfuscation unusable. These are often the same techniques used by malware to make malware analysis difficult, i.e., have two valid interpretations of the instruction stream if decoded by different offsets. Native Client does, however, permit a form of self-modifying code since it has JIT support. Mono uses just-in-time code generation, for example, and the same interfaces can be used to create obfuscated code, as long as the JIT'ted code continue to conform to the NaCl security requirements.
Using the JIT interface would of course make your code non-portable to other CPU architectures.

Related

Run R script and hide the actual code from user

I have created an R code script that:
Reads some data from a database
Makes some transformations and..
exports into a csv the modified table.
This code needs to run in a client's machine, but we need to "hide" the actual code from the user.
Is there any useful suggestions on how we can achieve that?
Up front
... it will be nearly impossible to deploy an R <something> to another computer in a way that prevents curious users from accessing the source code.
From a mailing list conversation in 2011, in response to "I would not like anyone to be able to read the code.",
R is an open source project, so providing ways for you to do this is not
one of our goals.
Duncan Murdoch https://stat.ethz.ch/pipermail/r-help/2011-July/282755.html
(Prof Murdoch was on the R Core Team and R Foundation for many years.)
Background
Several (many?) programming languages provide the ability to compile a script or program into an executable, the .exe you reference. For example, python has tools like py2exe and PyInstaller. The tools range from merely compactifying the script into a zip-ball, perhaps obfuscating the script; ... to actually creating a exe with the script either tightly embedded or such. (This part could use some more citations/research.)
This is usually good enough for many people, by keeping the honest out. I say it that way because all you need to do is google phrases like decompile py2exe and you'll find tools, howtos, tutorials, etc, whose intent might be honestly trying to help somebody recover lost code. Regardless of the intentions, they will only slow curious users.
Unfortunately, there are no tools that do this easily for R.
There are tools with the intent of making it easy for non-R-users to use R-based tools. For instance, RInno and DesktopDeployR are two tools with the intent of creating Windows (no mac/linux) installers that support R or R/shiny tools. But the intent of tools like this is to facilitate the IT tasks involved with getting a user/client to install and maintain R on their computer, not with protecting the code that it runs.
Constrain R.exe?
There have been questions (elsewhere?) that ask if they can modify the R interpreter itself so that it does not do everything it is intended to do. For instance, one could redefine base::print in such a way that functions' contents cannot be dumped, and debug doesn't show the code it's about to execute, and perhaps several other protective steps.
There are a few problems with this approach:
There is always another way to get at a function's contents. Even if you stop print.default and the debugger from doing this, there are others ways to get to the functions (body(.), for one). How many of these rabbit holes do you feel you will accurately traverse, get them all ... with no adverse effect on normal R code?
Even if you feel you can get to them all, are you encrypting the source .R files that contain your proprietary content? Okay, encrypting is good, except you need to decrypt the contents somehow. Many tools that have encrypted contents do so to thwart reverse-engineering, so they also embed (obfuscatedly, of course) the decryption key in the application itself. Just give it time, somebody will find and extract it.
You might think that you can download the key on start-up (not stored within the app), so that the code is decrypted in real-time. Sorry, network sniffers will get the key. Even if you retrieve it over https://, tools such as https://mitmproxy.org/ will render this step much less effective.
Let's say you have recompiled R to mask print and such, have a way to distribute source code encrypted, and are able to decrypt it in a way that does not easily reveal the key (for full decryption of the source code files). While it takes a dedicated user to wade through everything above to get to the source code, none of the above steps are required: they may legally compel you to release your changes to the R interpreter itself (that you put in place to prevent printing function contents). This doesn't reveal your source code, but it will reveal many of your methods, which might be sufficient. (Or just the risk of legal costs.)
R is GPL, and that means that anything that links to it is also "tainted" with the GPL. This means that anything compiled with Rcpp, for instance, will also be constrained/liberated (your choice) by the GPL. This includes thoughts of using RInside: it is also GPL (>= 2).
To do it without touching the GPL, you'd need to write your interpreter (relatively from scratch, likely) without code from the R project.
Alternatives
Ultimately, if you want to release R-based utilities/apps/functionality to clients, the only sure-fire way to allow them to use your code without seeing it is to ... control the computers on which R will run (and source code will reside). I'll add more links supporting this claim as I find them, but a small start:
https://stat.ethz.ch/pipermail/r-help/2011-July/282717.html
https://www.researchgate.net/post/How_to_make_invisible_the_R_code
Options include anything that keeps the R code and R interpreter completely under your control. Simple examples:
Shiny apps, self-hosted (or on shinyapps.io if you trust their security); servers include Shiny Server (both free and commercial versions), RStudio Connect (commercial only), and ShinyProxy. (The list is not known to be exclusive.)
Rplumber is an API server, not a shiny server. The intent is for single HTTP(s) endpoint calls, possibly authenticated, supporting whatever HTTP supports (post, get, etc). This can be served in various ways, see its hosting page for options.
Rserve. I know less about this, but from what I've experienced with it, I've not had as much luck integrating with enterprise systems (where, e.g., authentication and fine-control over authorization is important). This does allow near-raw access to R, so it might not be what you want (especially when the intent is to give to clients who may not be strong R users themselves).
OpenCPU should be discussed, but not as a viable candidate for "protect your code". It is very similar to rplumber in that it provides HTTP endpoints, but it supports endpoints for every exported function in every package installed in its R library. This includes the base package, so it is not at all difficult to get the source code of any function that you could get on the R console. I believe this is a design feature, even if it is perfectly at odds with your intent to protect your code.
Anything that can call R or Rscript. This might be PHP or mod_python or similar. Any web-page serving language that can exec("/usr/bin/Rscript",...) can take its output and turn it around to the calling agent. (It might also be possible, for example, for a PHP front-end to call an opencpu endpoint that only permits connections from the PHP-serving host.)

D3D11 in Metro doesn't support D3DReflect? (Why Not?)

D3D11 in Metro doesn't support D3DReflect.
Why not?
My API uses this to dynamically get the constant buffer sizes of shaders.
Is there any other way to get a constant buffer size dynamically in D3D11 without a ID3D11ShaderReflection object? Or get the constant variables by name?
What if I wanted to make a shader compiler tool for Metro?
What is I wanted to make an Art application that allowed you to dynamically generate complex brushes that requires shader generation. But this doesn't work.
Does Windows(Desktop), OSX, Linux, iOS or Android have these shader limitations?
No, so why on earth does Metro?
See http://social.msdn.microsoft.com/Forums/en-US/wingameswithdirectx/thread/9ae33f2c-791a-4a5f-b562-8700a4ab1926 for some discussion about it.
There is not an official position explaining why they made such restriction, but it is very similar to the whole restriction of dynamic code execution on WinRT. So your application scenario is unfortunately not possible.
Though, It would be feasible to hack/patch d3dcompiler_xx.dll and redirect all dll imports to call another DLL that would use authorized only APIs, but that's quite some work, and It is not even sure that this is legal (even by extracting the dll code from original d3dcompiler and rebuilding a new dll).
Another option for your scenario is to send the shader over the internet to a server that compiles and returns bytecode and reflection info... far from being perfect.
Among the platforms you mention, probably iOS is the one that could have the same restriction (I don't develop on this platform, so I can't confirm it).

Encrypting R script under MS-Windows

I have a bunch of R scripts which I am running on a Windows machine and want to ensure that the code remains unread by those not intended to see it. On a Linux box, I could wrap the R code in a bash script #! and make an encrypted (and perhaps even a limited-life) executable shell script. What are my options to do something on similar lines under Windows?
My answer is a bit late, but I believe this is a good question. Unfortunately, I don't believe that there is a solution, or at least an easy one, at the present time.
The difficulty is common because, for most interpreted languages, including R, it is often possible to turn on logging and inspection of all commands being run. This can negate many tricks to obfuscate the code.
For those who prefer to think of code being open == good, one should know that a common reason to obfuscate the code is if one is consulting with a client that hires multiple vendors. It is not uncommon for a client to take scripts from vendor A and ask vendor B why it doesn't work with their system. (This may be done by a low-level IT flunkie, rather than someone responsible for the NDA contracts.) If A & B are competitors, A's code has just been handed to B. When scripts == serious programs, then serious code has been given away.
The ways I've seen this addressed are:
Make a call to a compiled language, and use standard protections available there.
Host the executable on a different server, and use calls to the server to execute the calculations. (In R, there are multiple server-side options.)
Use compiled (preprocessed / bytecode) code within the language.
Option 2 is actually easier and better when the code may be widely distributed, not just for IP reasons. A major advantage is that it lets you upgrade the code without having to go through the pain of a site-wide release process. If new libraries are needed, no problem - update the server.
Option 3 is done in Matlab with .p files, and can be done with py2exe for Python on Windows. In R, the new bytecode compilation may be analogous, but I am not familiar enough with it to address any differences between .Rc files in the R context and .p files in the Matlab context. For more info on the compiler, see: http://www.inside-r.org/r-doc/compiler/compile
Hosting computations on the server is great for working with unsophisticated users, because it is easier to iterate quickly in response to bugs or feature requests. The IP protection is simply a benefit.
This is not a specifically R-oriented strategy. (And it's a bit unclear what your constraints or goals really are anyway.) If you want a cross-platform encryption method, you should look into the open-source program TrueCrypt. It supports creating encrypted files that can be mounted as volumes on any machine that supports the volume formatting method. I have tested this across the Mac PC divide , since the Mac can read FAT files, but have no experience with how it might work across the Linux-PC chasm.
(Their TODO list for Windows includes;"Command line options for volume creation (already implemented in Linux and Mac OS X versions)". So I don't see any clear way to use this from within R without you running the program from the OS.)
I don't think this is possible because the R interpreter has to be able to decrypt and read the code in order to execute it which means that whoever is using that interpreter will also be able to decrypt and read the code.
I am by no means an expert, so I reserve the right to be 100% wrong about that statement.
I believe the best solution is to ensure value comes from the expertise and services provided by your company and it's employers---not from keeping secrets.
Failing that, you could try separating the code into a client/server model. That way the client just sends data and receives results---they never have access to the code that runs on the server.
However, the scientist in me just said "that solution sucks and I would never trust results provided under such conditions".

Encrypting Scripts for Embedding in Text Files

I'm working on a closed-source game that uses a scripting language for automation. Almost all of the game logic is handled by scripts. Scripts can be compiled to a bytecode format, but due to the nature of the language, identifiers must be preserved. Compiled scripts can be embedded in other text-based resource formats using a binary-to-text encoding.
I want to encrypt the compiled scripts to protect the source during distribution, but because the language, bytecode format, and binary-to-text encoding scheme are all proprietary, do I need to worry about encryption at all? If so, should I simply perturb some bytes and call it a day, or should I make use of a fully featured encryption solution? Encryption should not increase the size of the executable unduly, because scripts can be large and load times are important.
On Windows, the size of the executable has no impact on load times because the exe is just mapped into memory and then paged in as needed. I can't imagine why that would not be true for *nix as well.
So, if the scripts don't need to change separately from your .exe, you could imbed them into the .exe, that would make them difficult for users to change even if they could find them. I wrote a little tool once that turned data files into .obj files that made it really easy to imbed data into my exe - it turned out to be pretty easy to write an object file that contains only data.
Of course, if you really care about protecting this data than full encryption in your only choice, but if you are just trying to discourage casual hacking, making the files hard to get at might be good enough.
You shouldn't assume that people can't read binary proprietary formats. There are many people who are very good at reverse-engineering protocols without any documentation.
So if you want to keep your source safe, you need some real security. The only problem is that if you encrypt the files, you'll need to give your users the decryption key in order to play the game, and when you do that then its only a matter of time before someone works out how to get the key and use it to decrypt all the files.
So basically, there's not much you can do unfortunately. You could try obfuscating your code, but even that's not going to stop everyone.
What you're talking about isn't going to be encryption, because you're going to have to ship the decryption key with it. It's just obfuscation. No matter how much you try to hide the decryption key, if your program can find it, so can the user.
So once you understand that we're just talking about various obfuscation schemes, the question is how much obfuscation you need. Likely the proprietary byte-compiling is a higher barrier than the encryption would be, and I'd call it a day. Anyone who wants to trace the logic can just put a debugger on it whether you encrypt or not. If they've already reverse-engineered your run-time engine to work out the byte-codes, then they're already in the portion of the code that has the unencrypted data.
That said, if you find the identifiers in the file to be problematic you can mechanically converts them to random strings prior to byte-compiling.
Encryption would not buy you much here.
Basically, whatever encryption layer you add, the executable itself must be able to perform the decryption in order to run the scripts. You lock the door, but you leave the key in the lock. This is unavoidable.
What encryption does is that it somewhat raises the bar on who may access the data. It requires some disassembly skills. Simply embedding the files in the executable will already filter out the casual not-very-good hacker. Those who are not deterred by such embedding are also those who will be able to follow the data processing path, find the decryption logic, and siphon out the decrypted code at will. A layer of encryption may also increase the feeling of importance: that which was encrypted is certainly worthwhile. Hence, trying too funky things may just make your situation worse, not better.
On the other hand, embedding the files in the executable binary is probably a good idea. It would remove the need to locate them at runtime on the filesystem (locating things at runtime is known to be somewhat more difficult on Unix systems than on Windows, because of hard links: on Windows, an executable can easily obtained its own path, but on Unix the presence of hard links means that "the" executable path is ill-defined).

Why does the Adobe Alchemy Tool create faster running flash byte code than the flex compiler?

I have seen a few blog entries on this and have had a discussion or two with my team mates but I would like to see what the stack overflow community thinks.
So why does the Adobe Alchemy Tool create so much faster running flash byte code than the flex compiler?
Also, when will the flex compiler be able to make similar performance gains?
Will it require programmer specific use of special Array's or something of that nature to get the same performance?
Alchemy is an implementation of LLVM in ActionScript. Simply put, it's an virtual machine that uses a ByteArray as it's memory store.
The C code compiled by Alchemy has direct access to "memory" (via some opcodes introduced in Flash 10), allowing it to chunk memory around at it's leisure (including pointers to objects). This results in some, but by no means all, code running faster. Some types of code will actually run slower in Alchemy due to it being a VM running on top of the AVM (another VM).
Additionally, Alchemy does not have native access to ActionScript classes and must access them through interop classes.
The alchemy tool creates code that uses instructions in the flash player that aren't available to the regular compiler (and the talk is that these instructions were exposed especially for alchemy).
Whether the regular compiler will eventually make similar gains, hopefully. It's been proven a few times that the compiler creates substandard code, and there are a couple of projects which optimise the generated code. These may shame Adobe into improving.
Chances are, no, there won't be anything special a programmer needs to do to get these performance gains (though check out the optimising blogs, writing loops in a particular way means they can be optimised better).

Resources