Get AST from .Net assembly without source code (IL code) - abstract-syntax-tree

I'd like to analyze .Net assemblies to be language independent from C#, VB.NET or whatever.
I know Roslyn and NRefactory but they only seem to work on C# source code level?
There is also the "Common Compiler Infrastructure: Code Model and AST API" project on CodePlex which claims to "supports a hierarchical object model that represents code blocks in a language-independent structured form" which sound exactly for what I looking for.
However I'am unable to find any useful documentation or code that is actual doing this.
Any advice how to archive this?
Can Mono.Cecil maybe doing something?

You can do this and there is also one (although tiny) example of this in the source of ILSpy.
var assembly = AssemblyDefinition.ReadAssembly("path/to/assembly.dll");
var astBuilder = new AstBuilder(new DecompilerContext(assembly.MainModule));
decompiler.AddAssembly(assembly);
astBuilder.SyntaxTree...

The CCI Code Model is somewhere between a IL disassembler and full C# decompiler: it gives your code some structure (e.g. if statements and expressions), but it also contains some low level stack operations like push and pop.
CCI contains a sample that shows this: PeToText.
For example, to get Code Model for the first method of the Program type (in the global namespace), you could use code like this:
string fileName = "whatever.exe";
using (var host = new PeReader.DefaultHost())
{
var module = (IModule)host.LoadUnitFrom(fileName);
var type = (ITypeDefinition)module.UnitNamespaceRoot.Members
.Single(m => m.Name.Value == "Program");
var method = (IMethodDefinition)type.Members.First();
var methodBody = new SourceMethodBody(method.Body, host, null, null);
}
To demonstrate, if you decompile the above code and show it using PeToText, you're going to get:
Microsoft.Cci.ITypeDefinition local_3;
Microsoft.Cci.ILToCodeModel.SourceMethodBody local_5;
string local_0 = "C:\\code\\tmp\\nuget tmp 2015\\bin\\Debug\\nuget tmp 2015.exe";
Microsoft.Cci.PeReader.DefaultHost local_1 = new Microsoft.Cci.PeReader.DefaultHost();
try
{
push (Microsoft.Cci.IModule)local_1.LoadUnitFrom(local_0).UnitNamespaceRoot.Members;
push Program.<>c.<>9__0_0;
if (dup == default(System.Func<Microsoft.Cci.INamespaceMember, bool>))
{
pop;
push Program.<>c.<>9.<Main0>b__0_0;
Program.<>c.<>9__0_0 = dup;
}
local_3 = (Microsoft.Cci.ITypeDefinition)System.Linq.Enumerable.Single<Microsoft.Cci.INamespaceMember>(pop, pop);
local_5 = new Microsoft.Cci.ILToCodeModel.SourceMethodBody((Microsoft.Cci.IMethodDefinition)System.Linq.Enumerable.First<Microsoft.Cci.ITypeDefinitionMember>(local_3.Members).Body, local_1, (Microsoft.Cci.ISourceLocationProvider)null, (Microsoft.Cci.ILocalScopeProvider)null, 0);
}
finally
{
if (local_1 != default(Microsoft.Cci.PeReader.DefaultHost))
{
local_1.Dispose();
}
}
Of note are all those push, pop and dup statements and the lambda caching condition.

As far as I know, it's not possible to build AST from binary (without sources) since AST itself generated by parser as part of compilation process from sources.
Mono.Cecil won't help because you can only modify opcodes/metadata with them, not analyze assembly.
But since it's .NET you can dump IL code from dll with help of ildasm. Then you can pass generated sources to any parser with CIL dictionary hooked up and get AST from parser. The problem is that as far as I know there is only one publically available CIL grammar for parser, so you don't really have a choice. And ECMA-355 is big enough so it's bad idea to write your own grammar.
So I can suggest you only one solution:
Pass assembly to ildasm.exe to get CIL.
Then pass CIL to ANTLR v3 parser with this CIL grammar wired up (note it's a little bit outdated - grammar created at 2004 and latest CIL specification is 2006, but CIL doesn't really change to much)
After that you can freely access AST generated by ANTLR
Note that you will need ANTLR v3 not v4, since grammar written for 3rd version, and it's hardly possible to port it to v4 without good knowledge of ANTLR syntax.
Also you can try to look into new Microsoft ryujit compiler sources at github (part of CoreCLR) - I don't sure that it's helps, but in theory it must contains CIL grammar and parser implementations since it works with CIL code. But it's written in CPP, have enormous code base and lacks of documentation since it's in active development stage, so it's may be easier to stuck with ANTLR.

If you treat the .net binary file as a stream of bytes, you ought to be able to "parse" it just fine.
You simply write a grammar whose tokens are essentially bytes. You can certainly build a classical lexer/parser with almost any set of lexer/parser tools by defining the lexer to read single bytes as tokens.
You can then build the AST using standard AST-building machinery for the parsing engine (on your own for YACC, automatically with ANTLR4).
What you will discover, of course, is that "parsing" isn't enough; you'll still need to build symbol tables, and carry out control and data flow analyses if you are going to do serious analysis of the corresponding code. See my essay on LifeAfterParsing.
You will also likely have to take into account "distinguished" functions that provide key runtime facilities to the particular programming languages that actually generated the CIL code. And these will make your analyzers language-dependent. Yes, you still get to share the part of the analysis that works on generic CIL.

Related

Is there a way to import multiple enumerands in IBM Rhapsody?

I have an enumerand of around 150 entries, which I need to get into IBM Rhapsody.
Doing this by hand is clearly lengthy and error prone. I have google extensively but found only things that tell me how to edit the generated code -- not go the other way.
The question is: How is this done? And if there is no way -- please someone post that as an answer.
David,
I would jump into the Java API (plugin subsystem) and do it that way. If you haven't learned how to use the API, there is a bit of a learning curve. There are two ways to go about it: Implement a Java (or your favorite JVM language--I use Scala) app that realizes the Rhapsody Plugin framework and then you choose to package it up and deploy it so that it gets loaded when you load your model, or, if it is a one off job, do everything up to the point of packaging it up and then run it from within your IDE and you are done. If you are comfortable with Scala, I can post some code.
So what I did in the end was I edited the relevant .sbs file, used a small python program to generate the items I required, and then update the length of the array accordingly.
all_the_literals = ["enum_name = 0x4e", enum_name2 = 0xF2", ... ,]
for field1, waste, field1_value in map(lambda x: x.split(" "),
all_the_literals):
literal_string = f""" {{ IEnumerationLiteral
- _id = GUID {uuid.uuid4()};
- _name = \"{field1}\";
- codeUpdateCGTime = 5.16.2022::19:24:18;
- _modifiedTimeWeak = 5.16.2022::19:24:18;
- _value = \"{field1_value}\";
}}"""
print(literal_string)
Note the above "code" snippet purely prints the items, which you then copy-paste into the relevant field in the sbs file. YMMV -- this was the correct format for an enum in Rhapsody (and note how I fudged the update time, but it worked successfully, so you'll need to do the same if you use this answer).
Also note it's probably better to use bauhaus9's answer, but I definitely didn't have time for it.

File Exists and Exception Handling in U-Sql

Two questions
How to check file exists or not before EXTRACT?
we have scenario where new inputs file is generated every day for catalog data. we need to merge new input with d-1 file. before merge we what to make sure that new input file exists at source location
does u-sql supports try...catch block?
Regarding checking if a file exists. We recently released a compile-time IF statement that indeed can check for partition existence (and other objects such as files and tables are on the roadmap).
Once that feature is released (still one or two refreshs out at the time of this answer) it may look something like (syntax subject to change):
IF FILE.EXISTS("/mydir/myfile.csv") THEN
#data = EXTRACT ... FROM "/mydir/myfile.csv" USING ...;
...
#jobstate = SELECT * FROM (VALUES("job completed")) AS T(status);
ELSE
#jobstate = SELECT * FROM (VALUES("file not ready. Job not executed.")) AS T(status);
END;
OUTPUT #jobstate TO "/jobs/myjobstate.csv" USING Outputters.Csv();
You will be able to provide the name as a parameter as well. Please let me know if that will work for your scenario.
An other alternative is to use the file set syntax, especially if you want to use a dynamic value to determine the process. That would simply create an empty rowset:
#data = EXTRACT ..., date DateTime
FROM "/mydir/{date:yyyy}/{date:MM}/{date:dd}/data.csv"
USING ...;
#data = SELECT * FROM #data WHERE date == DateTime.Now.AddDays(-1);
... // continue processing #data that is empty if yesterday's file is not yet there
Having said that, you may want to check of your job orchestration framework (such as ADF) may be a better place to check for existence before submitting the job in the first place.
As to the try catch block: U-SQL itself is a script-level optimizable, declarative language where the plan gets generated and optimized at runtime over the whole script. Thus providing a dynamic TRY-CATCH is currently not available, since it would severely impact the ability to optimize the script (e.g., you cannot move predicates or column pruning outside of a try-catch block). Also TRY/CATCH can lead to some very hard to understand and debug code, especially if it is used to mimic procedural workflows in an otherwise declarative environment.
However, you can use try/catch inside your C# functions without problems if you need to catch C# runtime errors.
FILE.EXISTS() always returns True when executed locally. However, it works when executing against Azure Data Lake.
Tried MSDN example and the following returns True, True
DECLARE #filepath_good = "/Samples/Data/SearchLog.tsv";
DECLARE #filepath_bad = "/Samples/Data/zzz.tsv";
#result =
SELECT FILE.EXISTS(#filepath_good) AS exists_good,
FILE.EXISTS(#filepath_bad) AS exists_bad
FROM (VALUES (1)) AS T(dummy);
OUTPUT #result
TO "/Output/FileExists.txt"
USING Outputters.Csv();
I have Microsoft Azure Data Lake Tools for Visual Studio version 2.2.5000.0

Why is this code added to MetadataTypesHandler.ProcessRequest

Why is this code added to MetadataTypesHandler.ProcessRequest() in ORMLite for ServiceStack?
httpRes.ContentType = "application/x-ssz-metatypes";
var encJson = CryptUtils.Encrypt(EndpointHostConfig.PublicKey, json, RsaKeyLengths.Bit2048);
httpRes.Write(encJson);
Looks like it's signing the page but the content type is non-standard.
That code lives in ServiceStack project itself, it doesn't have nothing to do with OrmLite. That code is essentially the beginning part of this future feature to provide enough metadata suitable for being able to code-gen DTOs locally as a substitute for sharing dlls. Because it involves code-gen from a remote source it's encrypted to give us better security/control on how and what gets code-gen'ed.

Is it possible to intermix Modular templating and legacy VBScript CT?

In particular, the case I have in mind is this:
##RenderComponentPresentation(Component, "<vbs-legacy-ct-tcm-uri>")##
The problem I'm having is that in my case VBS code breaks when it tries to access component fields, giving "Error 13 Type mismatch ..".
(So, if I were to give the answer, I'd say: "Partially, of no practical use")
EDIT
The DWT above is from another CT, so effectively it's a rendering of component link, that's why parameterless overload as per Nuno's suggestion won't work unfortunately. BTW, the following lines inside VBS don't break and give correct values:
WriteOut Component.ID
WriteOut Component.Schema.Title
EDIT 2
Dominic was absolutely wright: it's a missing dependencies.
A bit more insight to make this info generally useful:
Suppose, the original CT looked like this ("VBScript [Legacy]" type):
[%
Call RenderComponent(Component)
%]
This CT was meant to be called from a PT, also VBS-based. That PT had a big chunk of "#include" statements in the beginning.
Now the story changes: the same CT is being called from another, DWT-based, CT. Obviously (thanks you all for your invaluable help!), dependencies are now not being included anywhere.
The solution to make original CT working again is to explicitly hand-pick and include all necessary VBS TBBs, so the original CT becomes:
[%
#include "tcm:<uri-of-vbs-tbb>"
Call RenderComponent(Component)
%]
Yes - it's perfectly possible to mix and match legacy and modular templates. Perhaps obviously, you can't mix and match template building blocks between the two techniques.
In VBScript "Error 13 Type mismatch" is sometimes used as a secret code that really means "I don't recognise the name of one of your variables, (including the names of Functions and Subs)" In the VBScript templating engine, variables from the page template could be in scope in your component template; it was very common, for example, to put the #includes in the PT so they could be used by the CT. My guess is that your component template is trying to use such a Function, and not finding it.
I know that you can render a Modular Page Template with VBScript Component Presentations, and also a VbScript page template can render a modular Component Template.
Your error is possibly due to something else? Have you tried just using the regular ##RenderComponentPresentation()## call without specifying which template?
The Page Template can render Compound Templates of different flavors - for example Razor, VBS, or XSLT.
The problem comes from the TBBs included in the Templates. Often the Razor templates will need to call functions that only exist in VBScript. So, the starting point when migrating templates is always to start with the helper functions and utility libraries. Then migrate the most generic PT / CT you have to the new format (Razor, XSLT, DWT, etc). This provides a nice basis to migrate the rest of the Templates as you have time to the new format.

Are there solutions for streamlining the update of legacy code in multiple places?

I'm working in some old code which was originally designed for handling two different kinds of files. I was recently tasked with adding a new kind of file to this code. Most of my problems were solved by filling out an extensive XML file with a new entry that handled everything from what lists were named to how the file is written in plural lower case. But this ended up being insufficient, as there were maybe 50 different places in 24 different code files where I had to update hardcoded switch-statements that only branched for the original two file types.
Unfortunately there is no consistency in this; there are methods which operate half from the XML file, and half off of hardcode. Some of the files which look like they would operate off of the XML file don't, and some that I would expect that I'd need to update the hardcode don't need it. So the only way to find the majority of these is to run through testing the whole system when only part of it is operational, finding that one step to fix (when I'm lucky that error logging actually tells me what is going on), and then running the whole thing again. This wastes time testing the parts of the code which are already confirmed to work, time better spent testing the new parts I have to add on top of it all.
It's a hassle and a half, and to my luck I can expect that I will have to add yet another new kind of file in the near future.
Are there any solutions out there which can aid in this kind of endeavour? Something which I can input some parameters of current features, document what points in a whole code project actually need to be updated, and run something nice the next time I need to add a new feature to the code. It needn't even be fully automated, something that'll help me navigate straight to the specific points in everything and maybe even record what kind of parameters need to be loaded.
Doubt it matters specifically, but the code is comprised of ASP.NET pages, some ASP.NET controls, hundreds of C# code files, and a handful of additional XML files. It's all currently in a couple big Visual Studio 2008 projects.
Not exactly what you are describing, but if you can introduce a seam into the code and lay down some interfaces you can break out and mock, a suite of unit/integration tests would go a long way to helping you modify old code you may not fully understand well.
I completely agree with the comment about using Michael Feathers' book to learn how to wedge new tests into legacy code. I'd also strongly recommend Refactoring, by Martin Fowler. What it sounds like you need to do for your code is to implement the "Replace conditionals with polymorphism" refactoring.
I imagine your code today looks somewhat like this:
if (filetype == 23)
{
type23parser.parse(file);
}
else if (filetype == 69)
{
filestore = type69reader.read(file);
File newfile = convertFSto23(filestore);
type23parser.parse(newfile);
}
What you want to do is to abstract away all the "if (type == foo)" kinds of logic into strategy patterns that are created in a factory.
class FileRules : pReader(NULL), pParser(NULL)
{
private:
FileReaderRules *pReader;
FileParserRules *pParser;
public:
void read(File* inFile) {pReader->read(inFile);};
void parse(File* inFile) {pParser->parse(inFile);};
};
class FileRulesFactory
{
FileRules* GetRules(int inputFiletype, int parserType)
{
switch (inputFiletype)
{
case 23:
pReader = new ASCIIReader;
break;
case 69:
pReader = new EBCDICReader;
break;
}
switch (parserType)
... etc...
then your main line of code looks like this:
FileRules* rules = FileRulesFactory.GetRules(filetype, parsertype);
rules.read(file);
rules.parse(file);
Pull off this refactoring, and adding a new set of file types, parsers, readers, etc., becomes as simple as writing one exclusive to your new type.
Of course, go read the book. I vastly oversimplified it here, and probably got stuff wrong, but you should get the general idea of how to approach it from this. I can also recommend another book, "Head First Design Patterns", which has a great section on the Factory patterns (if you like those "Head First" kinds of books.)

Resources