How much success is there working on ASP.NET decompiled by Reflector? - asp.net

I just finished a small project where changes were required to a pre-compiled, but no longer supported, ASP.NET web site. The code was ugly, but it was ugly before it was even compiled, and I'm quite impressed that everything still seems to work fine.
It took some editing, e.g. to remove control declarations, as they get put in a generated file, and conflict with the decompiled base class, but nothing a few hours didn't cure.
Now I'm just curious as to how many others have had how much success doing this. I would actually like to write a CodeProject article on defining, if not automating, the reverse engineering process.

Due to all the compiler sugar that exists in the .NET platform, you can't decompile a binary into the original code without extremely sophisticated decompilers. For instance, the compiler creates classes in the background to handle enclosures. Automating this kind of thing seems like it would be a daunting task. However, handling expected issues just to get it to compile might be scriptable.

Will:
Due to all the compiler sugar that exists in the .NET platform
Fortunately this particular application was incredibly simple, but I don't expect to decompile into the original code, just into code works like the original, or maybe even provides an insight into how the original works, to allow 'splicing' in of new code.

i had to do something similar, and i was actually happier than if i had the code. it might have taken me less time to do it, but the quality of the code after the compiler optimized it was probably better than the original code. So yes, if its a simple application, is relatively simple to do reverse engineer it; on the other hand i would like to avoid having to do that in the future.

If it was written in .NET 1.1 or .NET 2.0 you'll have a lot more success than anything compiled with the VS 2008 compilers, mainly because of the syntactic suger that the new language revisions brought in (Lambda, anonymous classes, etc).
As long as the code wasn't obfuscated then you should be able to use reflector to get viable code, if you then put it into VS you should immidiately find errors in the reflected code.
Be on the look out for variables/ method starting with <>, I see that a lot (particularly when reflecting .NET 3.5).
The worst you can do is export it all to VS, hit compile and determine how many errors there are and make a call from that.
But if it's a simple enough project you should be able to reverse engineer from reflector, at least use reflector to get the general gist of what the code is doing, and then recode yourself.

Related

Is roslyn a reasonable substitute for reflection related tasks?

Have a question, never used roslyn before so i'm wondering about maybe experimenting it in a task that i would normally use reflection.
I'm given an external dll, i need to go over some classes in that dll and extract some metadata on them.
Like for example, the class name, property names and types and such.
I would normally use reflection to do it. Should be a super simple task.
But i've been told that this can be achieved using roslyn.
Can it? From what i'm seeing, Roslyn can parse a class but i need to give it the code that represents this calss as text. How would i get the code as text in an already complied code?
Is that even a reasonable scenario to use roslyn? Does it worth the effort?
Thanx!
If all you need is information that's already easily available via reflection, then Roslyn is likely to make it much harder. There's quite a lot of setup required, which can be error-prone and brittle in the face of new releases, in my experience.
I would typically use reflection for anything where the starting point is an assembly. When the starting point is source code, that's when it makes more sense to use Roslyn.
When Roslyn is the right tool for the job, it's amazing - but it doesn't sound like that's the case here.
When you using Roslyn, you have lexical info and symbolic info.
The lexical info won't help you, you must use the symbolic info, for that, you must have compilation and you can create it for compiled code.
With the compilation, you indeed can achieve types info but not runtime info. Anyway, using reflection for this is much straight forward.
When your mission is related to tree transverse or syntax rewriting, Roslyn is perfect, but for metadata info, it's the wrong usage.
It depends on your specific needs but maybe there are other "tools" that more suitable for your task (e.g. cecil or dnlib)

make your Jar not to be decompiled

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Reflection: for frameworks only?

Somebody that I work with and respect once remarked to me that there shouldn't be any need for the use of reflection in application code and that it should only be used in frameworks. He was speaking from a J2EE background and my professional experience of that platform does generally bear that out; although I have written reflective application code using Java once or twice.
My experience of Ruby on Rails is radically different, because Ruby pretty much encourages you to write dynamic code. Much of what Rails gives you simply wouldn't be possible without reflection and metaprogramming and many of the same techniques are equally as applicable and useful to your application code.
Do you agree with the viewpoint that reflection is for frameworks only? I'd be interested to hear your opinions and experiences.
There's the old joke that any sufficiently sophisticated system written in a statically-typed language contains an incomplete, inferior implementation of Lisp.
Since your requirements tend to become more complicated as a project evolves, you often eventually find that the common idioms in statically-typed object systems eventually hit a wall. Sometimes reaching for reflection is the best solution.
I'm happy in dynamically-typed languages like Ruby, and statically-typed languages like C#, but the implicit reflection in Ruby often makes for simpler, easier-to-read code. (Depending on the metaprogramming magic required, sometimes harder to write).
In C#, I've found problems that couldn't be solved without reflection, because of information I didn't have until runtime. One example: When trying to manipulate some third-party code that generated proxies to Silverlight objects running in another process, I had to use reflection to invoke a specific strongly-typed "Generic" version of a method, because the marshalling required the caller to make an assumption about the type of the object in the other process was in order to extract the data we needed from it, and C# doesn't allow the "type" of the generic method invocation to be specified at run time (except with reflection techniques). I guess you could argue our tool was kind of a framework, but I could easily imagine a case in an ordinary application facing a similar problem.
Reflection makes DRY a lot easier. It's certainly possible to write DRY code without reflection, but it's often much more verbose.
If some piece of information is encoded in my program in one way, why wouldn't I use reflection to get at it, if that's the easiest way?
It sounds like he's talking about Java specifically. And in that case, he's just citing a special case of this: in Java, reflection is so wonky it's almost never the easiest way to do something. :-) In other languages like Ruby, as you've seen, it often is.
Reflection is definitely heavily used in frameworks, but when used correctly can help simplify code in applications.
One example I've seen before is using a JDK Proxy of a large interface (20+ methods) to wrap (i.e. delegate to) a specific implementation. Only a couple of methods were overridden using a InvocationHandler, the rest of the methods were invoked via reflection.
Reflection can be useful, but it is slower that doing a regular method call. See this reflection comparison.
Reflection in Java is generally not necessary. It may be the quickest way to solve a certain problem, but I would rather work out the underlying problem that causes you to think it's necessary in app code. I believe this because it frequently pushes errors from compile time to run time, which is always a Bad Thing for large enough software that testing is non-trivial.
I disagree, my application uses reflection to dynamically create providers. I might also use reflection to control logic flow, if the logic is simple and doesn't warrant a more complicated pattern.
In C# I use reflection to grab attributes off Enumeration which help me determine how to display an enumeration to an end user.
I disagree, reflection is very useful in application code and I find myself using it quite often. Most recently, I had to use reflection to load an assembly (in order to investigate its public types) from just the path of the assembly.
Several opinions on this subject are expressed here...
What is reflection and why is it useful?
Use reflection when there is no other way! This is a matter of performance!
If you have looked into .NET performance pitfalls before, it might not surprise you how slow the normal reflection is: a simple test with repeated access to an int property proved to be ~1000 times slower using reflection compared to the direct access to the property (comparing the average of the median 80% of the measured times).
See this: .NET reflection - performance
MSDN has a pretty nice article about When Should You Use Reflection?
If your problem is best solved by using reflection, you should use it.
(Note that the definition of 'best' is something learnt by experience :)
The definition of framework vs. application isn't all that black & white either. Sometimes your app needs a bit of framework to do its job well.
I think the observation that there shouldn't be any need for the use of reflection in application code and that it should only be used in frameworks is more or less true.
On the spectrum of how coupled some piece of code are, code joined by reflection are as loosely coupled as they come.
As such, the code which is doing it's job via reflection can quite happily fulfil it's role in life knowing not-a-thing about the code which is using it.

Figuring out the right language for the job: branching out from C#

I work in a Microsoft environment, so I can use my C# hammer on any nails I come across. That being said, what languages (compiled, interpreted, scripting, functional, any types!) complement knowing C#, and for what purposes? For example, I've moved a lot of script functionality away from compiled console apps and into Powershell scripts. If you're an MS developer, have you found a niche in your world for other languages like F#, IronRuby, IronPython, or something similar, and what niche do they fill?
Note: this question is directed at the Microsoft dev people since I can't run off and start installing LAMP stacks around my company, and therefore having to support it forever. :) However, feel free to mention any other languages that you found interesting to fulfill a certain task/role in your world apart from your main language.
Python/Perl/Ruby/PowerShell are great supplements to C#/VB.NET. If your boss hands you a text file and says insert it into the database once or twice, then any of Perl/Python/Ruby (I'm not sure about powershell but I imagine it is not that much more difficult) should be fine to parse it. Either way, for your main applications you will probably be stuck in C#. You can use one of the more dynamic languages to do code generation in C#.
Since you are in a Microsoft Environment, probably your best chance at getting your solution accepted is PowerShell. Next to that I'd say IronPython or something else that integrates with the CLR. But main issue is that for someone else to maintain what you do, they would have to know whatever language you are using. MS in the future has plans to use PowerShell a lot more, so it is probably easier to justify PowerShell then say Python/Perl/Ruby.
If you are just processing a text file for one time use. Or creating a one time code generator to generate all the code and then intend to maintain the generated code, then it doesn't matter. You are the one who will consume the results and if you save time using Perl then more power to you. But if you are doing something that will be used over and over again (like an active code generator where you change the templates and run the generator instead of maintaining the generated code) then other developers working on what you did will need to know the language you used. It is much harder to argue learning Perl/Ruby/Python in a Microsoft Shop. But PowerShell seems like the easier argument. I think the MS grand plan is that eventually applications will expose more functionality for power shell through commandlets. Assuming this happens then PowerShell is even more of a no-brainer because it will expose tons of scriptable functionality that you won't get any other way.
A nice scripting language is always a good tool to have on you belt. See Ruby, or Python.
I use python for prototyping, since there's almost no turn around time between edits and actually running the new version of the code. I may even end up using it for a real project - the more I use it, the more I like it.
It will take some getting used to as a C# programmer, though - the indentation-defines-structure system it uses is a little weird at first.
Since you are in a MS shop, I would suggest PowerShell as a decent scripting language to learn. It plays well with C#.
I'm a big fan of Ruby too.
I'd like to second or third python. Specifically, IronPython (ttp://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython) lets you learn python but also gives you access to the .net framework goodies.
It's quite nice for scripting-related tasks so it'll probably be useful for your day-to-day coding life, and also a nice way to muck around in an experimental coding/prototyping way.
While it's a bit of a fringe language, I'm compelled to mention Erlang. Erlang is an excellent language to have in your toolbox since it's unusual strengths tend to compliment other programming platforms. Erlang is very useful for building distributed, concurrent, fault-tolerant systems. It's used a lot in the instant-messaging and telephony world where there's a need for distributed, yet interconnected architectures.
Maybe play around with Boo and see what you think.
Boo at Codehaus.org
Boo at Wikipedia
If your using the .Net framework the language is really not important as the compiler and interpreter create the same IL code in any case.
If you step away from the .Net world, I am of the opinion that development tools and languages are a tool box. I strive to use the right tools for the job at hand, taking into consideration what the skill base of the other developers are and what direction a company is seeking of course (I'm a consultant).
I'm with jjnguy. Try one of the scripting languages. Plus as a bonus, when you learn Ruby/PythonPearl, etc...it's a gateway drug...err language to developing for other environments.
Trying out languages outside of your normal toolbox will give you new ways of approaching things in your current favorite language. Even if you don't use them for serious projects languages like Perl (for data mangling), Lisp (functional programming), and Javascript (prototype based programming) will teach you new ways to think about problems in your current language.
As a web developer by trade, you might look into the XSLT/XPath family as for certain types of XML processing they can be very powerful tools.
Granted, in C# 3.x Linq2Xml exposes some similar functionality inline.
XSLT, however, can be a powerful way to separate data from presentation in your apps.
I am very interested in F# and some of the other new languages in the CLR/DLR. The DLR languages might be a lot better for your UI, because they don't make you cast a lot of stupid things.
However, I think that it is important to keep in mind tha learning a new language, especially in a new area, like functional programming, is always a good way to re-train your mind so that you are exposed to new concepts and you can code better in your language of choice, even if you never use those new languages.
Check out Boo - it runs on top of the .NET stack, but its syntax is more like Python.
To learn a new language that complements C#, I'd go with C++. You can use it in a 'way better than p/invoke' style to get access to unmanaged code from your C# apps. You can then start using it to write the memory-constrained apps, and/or performance-critical bits in, if you find some of your .NET applications start hogging all the RAM and/or CPU or just generally aren't as fast as you'd like.

Migrating from ASP Classic to .NET and pain mitigation

We're in the process of redesigning the customer-facing section of our site in .NET 3.5. It's been going well so far, we're using the same workflow and stored procedures, for the most part, the biggest changes are the UI, the ORM (from dictionaries to LINQ), and obviously the language. Most of the pages to this point have been trivial, but now we're working on the heaviest workflow pages.
The main page of our offer acceptance section is 1500 lines, about 90% of that is ASP, with probably another 1000 lines in function calls to includes. I think the 1500 lines is a bit deceiving too since we're working with gems like this
function GetDealText(sUSCurASCII, sUSCurName, sTemplateOptionID, sSellerCompany, sOfferAmount, sSellerPremPercent, sTotalOfferToSeller, sSellerPremium, sMode, sSellerCurASCII, sSellerCurName, sTotalOfferToSeller_SellerCurr, sOfferAmount_SellerCurr, sSellerPremium_SellerCurr, sConditions, sListID, sDescription, sSKU, sInv_tag, sFasc_loc, sSerialNoandModel, sQTY, iLoopCount, iBidCount, sHTMLConditions, sBidStatus, sBidID, byRef bAlreadyAccepted, sFasc_Address1, sFasc_City, sFasc_State_id, sFasc_Country_id, sFasc_Company_name, sListingCustID, sAskPrice_SellerCurr, sMinPrice_SellerCurr, sListingCur, sOrigLocation)
The standard practice I've been using so far is to spend maybe an hour or so reading over the app both to familiarize myself with it, but also to strip out commented-out/deprecated code. Then to work in a depth-first fashion. I'll start at the top and copy a segment of code in the aspx.cs file and start rewriting, making obvious refactorings as I go especially to take advantage of our ORM. If I get a function call that we don't have, I'll write out the definition.
After I have everything coded I'll do a few passes at refactoring/testing. I'm just wondering if anyone has any tips on how to make this process a little easier/more efficient.
Believe me, I know exactly where you are coming from.. I am currently migrating a large app from ASP classic to .NET.. And I am still learning ASP.NET! :S (yes, I am terrified!).
The main things I have kept in my mind is this:
I dont stray too far from the current design (i.e. no massive "lets rip ALL of this out and make it ASP.NET magical!) due to the incredibly high amount of coupling that ASP classic tends to have, this would be very dangerous. Of course, if you are confident, fill your boots :) This can always be refactored later.
Back everything up with tests, tests and more tests! I am really trying hard to get into TDD, but its very difficult to test existing apps, so every time I remove a chunk of classic and replace with .NET, I ensure I have as much green-light tests backing me as possible.
Research a lot, there are some MAJOR changes between classic and .NET and sometimes what can be many lines of code and includes in classic can be achieved in a few lines of code, think before coding.. I've learnt this the hard way, several times :D
Its very much like playing Jenga with your code :)
Best of luck with the project, any more questions, then please ask :)
After I have everything coded I'll do a few passes at refactoring/testing. I'm just wondering if anyone has any tips on how to make this process a little easier/more efficient.
(source: cartoonstock.com)
Normally I'm not a fan of TDD, but in the case of refactoring it really is the way to go.
Write some tests first which verify what the bit you're looking at is actually doing. Then refactor. This is a LOT more reliable than just 'it looks like it still works.'
The other huge benefit to this is that when you're refactoring something which is further down the page, or in a shared library or something, you can just re-run the tests, as opposed to finding out the hard way that a seemingly unrelated change was actually related
You're going from classic ASP to ASP with 3.5 without just re-writing? Skillz. I've had to deal with some legacy ASP #work and I think it's just easier to parse it and re-write it.
A 1500-line ASP page? With lots of calls out to include files? Don't tell me -- the functions don't have any naming convention that tells you which include file has their implementation... That brings back memories (shudder)...
It sounds to me like you have a pretty solid approach -- I'm not sure if there is any magical way to mitigate your pain. After your conversion effort, the architecture of your app will still be messy and UI-heavy (i.e. code-behind running workflows), and it will probably still be fairly painful to maintain, but the refactoring you are doing should definitely help.
I hope you have weighed the upgrade you are doing against just rewriting from scratch -- as long as you are not intending to extend the app too much and you are not primarily responsible for maintaining the app, upgrading a complex workflow-based app like you are doing may be cheaper and a better choice than rewriting it from scratch. ASP.NET should give you better opportunities to improve performance and scalability, at least, than Classic ASP. From your question I imagine that it is too late in the process for that discussion anyway.
Good luck!
Sounds like you have a pretty good handle on things. I've seen a lot of people try to do a straight-line transliteration, includes and all, and it just doesn't work. You need to have a good understanding of how ASP.Net wants to work, because it's much different from Classic ASP, and it sounds like maybe you have that.
For larger files, I'd try to get a higher level view first. For example, one thing I've noticed is that Classic ASP was horrible about function calls. You'd be reading through some code and find a call to a function with no clue as to where it might be implemented. As a result, Classic ASP code tended to have long functions and scripts to avoid those nasty jumps. I remember seeing a function that printed out to 40 pages! Parsing straight through that much code is no fun.
ASP.Net makes it easier to follow function calls around, so you might start by breaking out your larger code blocks into several smaller functions.
Don't tell me -- the functions don't
have any naming convention that tells
you which include file has their
implementation... That brings back
memories (shudder)...
How did you guess? ;)
I hope you have weighed the upgrade
you are doing against just rewriting
from scratch -- as long as you are not
intending to extend the app too much
and you are not primarily responsible
for maintaining the app, upgrading a
complex workflow-based app like you
are doing may be cheaper and a better
choice than rewriting it from scratch.
ASP.NET should give you better
opportunities to improve performance
and scalability, at least, than
Classic ASP. From your question I
imagine that it is too late in the
process for that discussion anyway.
This was something we talked about. Based on timing (trying to beat a competitor's site to launch) and resources (basically two developers) it made sense to not nuke the site from orbit. Things have actually gone much better than I expected. We were aware even from the planning stages that this code was going to give us the most problems. You should see the revision history of the classic ASP pages involved, it's a bloodbath.
For larger files, I'd try to get a
higher level view first. For example,
one thing I've noticed is that Classic
ASP was horrible about function calls.
You'd be reading through some code and
find a call to a function with no clue
as to where it might be implemented.
As a result, Classic ASP code tended
to have long functions and scripts to
avoid those nasty jumps. I remember
seeing a function that printed out to
40 pages! Parsing straight through
that much code is no fun.
I've actually had this displeasure of working with the legacy code quite a bit so I have a decent high level understanding of the system. You're right about the function length, there are some routines (most I've refactored down into much smaller ones) that are 3-4x as long as any of the aspx pages/helper classes/ORMs on the new site.
I once came across a .Net app that was ported from ASP. The .aspx pages were totally blank. To render the UI, the developers used StringBuilders in the code behind and then did a response.write. This would be the wrong way to do it!
I once came across a .Net app that was ported from ASP. The .aspx pages were totally blank. To render the UI, the developers used StringBuilders in the code behind and then did a response.write. This would be the wrong way to do it!
I've seen it done the other way, the code behind page was blank, except for declaration of globals, then the VBScript was left in the ASPX.

Resources