Use-cases for reflection

Use-cases for reflection - reflection

Recently I was talking to a co-worker about C++ and lamented that there was no way to take a string with the name of a class field and extract the field with that name; in other words, it lacks reflection. He gave me a baffled look and asked when anyone would ever need to do such a thing.
Off the top of my head I didn't have a good answer for him, other than "hey, I need to do it right now". So I sat down and came up with a list of some of the things I've actually done with reflection in various languages. Unfortunately, most of my examples come from my web programming in Python, and I was hoping that the people here would have more examples. Here's the list I came up with:
Given a config file with lines like
x = "Hello World!"
y = 5.0
dynamically set the fields of some config object equal to the values in that file. (This was what I wished I could do in C++, but actually couldn't do.)
When sorting a list of objects, sort based on an arbitrary attribute given that attribute's name from a config file or web request.
When writing software that uses a network protocol, reflection lets you call methods based on string values from that protocol. For example, I wrote an IRC bot that would translate
!some_command arg1 arg2
into a method call actions.some_command(arg1, arg2) and print whatever that function returned back to the IRC channel.
When using Python's __getattr__ function (which is sort of like method_missing in Ruby/Smalltalk) I was working with a class with a whole lot of statistics, such as late_total. For every statistic, I wanted to be able to add _percent to get that statistic as a percentage of the total things I was counting (for example, stats.late_total_percent). Reflection made this very easy.
So can anyone here give any examples from their own programming experiences of times when reflection has been helpful? The next time a co-worker asks me why I'd "ever want to do something like that" I'd like to be more prepared.

I can list following usage for reflection:
Late binding
Security (introspect code for security reasons)
Code analysis
Dynamic typing (duck typing is not possible without reflection)
Metaprogramming
Some real-world usages of reflection from my personal experience:
Developed plugin system based on reflection
Used aspect-oriented programming model
Performed static code analysis
Used various Dependency Injection frameworks
...
Reflection is good thing :)

I've used reflection to get current method information for exceptions, logging, etc.
string src = MethodInfo.GetCurrentMethod().ToString();
string msg = "Big Mistake";
Exception newEx = new Exception(msg, ex);
newEx.Source = src;
instead of
string src = "MyMethod";
string msg = "Big MistakeA";
Exception newEx = new Exception(msg, ex);
newEx.Source = src;
It's just easier for copy/paste inheritance and code generation.

I'm in a situation now where I have a stream of XML coming in over the wire and I need to instantiate an Entity object that will populate itself from elements in the stream. It's easier to use reflection to figure out which Entity object can handle which XML element than to write a gigantic, maintenance-nightmare conditional statement. There's clearly a dependency between the XML schema and how I structure and name my objects, but I control both so it's not a big problem.

There are lot's of times you want to dynamically instantiate and work with objects where the type isn't known until runtime. For example with OR-mappers or in a plugin architecture. Mocking frameworks use it, if you want to write a logging-library and dynamically want to examine type and properties of exceptions.
If I think a bit longer I can probably come up with more examples.

I find reflection very useful if the input data (like xml) has a complex structure which is easily mapped to object-instances or i need some kind of "is a" relationship between the instances.
As reflection is relatively easy in java, I sometimes use it for simple data (key-value maps) where I have a small fixed set of keys. One one hand it's simple to determine if a key is valid (if the class has a setter setKey(String data)), on the other hand i can change the type of the (textual) input data and hide the transformation (e.g simple cast to int in getKey()), so the rest of the application can rely on correctly typed data.
If the type of some key-value-pair changes for one object (e.g. form int to float), i only have to change it in the data-object and its users but don't have to keep in mind to check the parser too. This might not be a sensible approach, if performance is an issue...

Writing dispatchers. Twisted uses python's reflective capabilities to dispatch XML-RPC and SOAP calls. RMI uses Java's reflection api for dispatch.
Command line parsing. Building up a config object based on the command line parameters that are passed in.
When writing unit tests, it can be helpful to use reflection, though mostly I've used this to bypass access modifiers (Java).

I've used reflection in C# when there was some internal or private method in the framework or a third party library that I wanted to access.
(Disclaimer: It's not necessarily a best-practice because private and internal methods may be changed in later versions. But it worked for what I needed.)

Well, in statically-typed languages, you'd want to use reflection any time you need to do something "dynamic". It comes in handy for tooling purposes (scanning the members of an object). In Java it's used in JMX and dynamic proxies quite a bit. And there are tons of one-off cases where it's really the only way to go (pretty much anytime you need to do something the compiler won't let you do).

I generally use reflection for debugging. Reflection can more easily and more accurately display the objects within the system than an assortment of print statements. In many languages that have first-class functions, you can even invoke the functions of the object without writing special code.
There is, however, a way to do what you want(ed). Use a hashtable. Store the fields keyed against the field name.
If you really wanted to, you could then create standard Get/Set functions, or create macros that do it on the fly. #define GetX() Get("X") sort of thing.
You could even implement your own imperfect reflection that way.
For the advanced user, if you can compile the code, it may be possible to enable debug output generation and use that to perform reflection.

Related

Is reflection actually useful apart from reverse engineering?

Languages such as Java and PHP support reflection, which allows objects to provide metadata about themselves. Are there any legitimate use cases where you would need to be able to do something like ask an object what methods it has outside of the realm of reverse engineering? Are any of those use cases actually implemented today?

Reflection is used extensively in Java by frameworks which are leveraged at runtime to operate with other code dynamically. Without reflection, all links between code must be done at compile time (statically).
So, for example, any useful plug-in framework (OSGi, JSPF, JPF), leverages Reflection. Any injection framework (Spring, Guice, etc) leverages Reflection.
Any time you want to write a piece of code that will interact with another piece of code without having that piece of code available when compiling, Reflection is the way forward in Java.
However, this is best left to frameworks and should be encapsulated.

There certainly are good use cases. For example, obtaining developer-provided metadata. Java APIs are increasingly using annotations to provide info about methods/fields/classes and their use. Like input validation, binding to data representations... You could use these at compile-time to generate metadata descriptors and use those, but to do it at runtime would require reflection. Even if you used the metadata descriptors, they'd end up containing things like class, method and field names that'd need to be accessed via reflection.
Another use case: dynamic languages. Take Ruby... It allows you to check up-front whether an object would respond to a method name before trying to call that method. Something like that requires reflection.
Or how about when a class or method name must be provided from outside compiled code, like when selecting an implementation of some API. That's just gonna be a bit of text. Looking up what it resolves to comes down to reflection.

Frameworks like Spring or Hibernate make extensive use of reflection to inspect a class and see the annotations.

Frameworks for debugging, serialization, logging, testing...

API design: is "fault tolerance" a good thing?

I've consolidated many of the useful answers and came up with my own answer below
For example, I am writing a an API Foo which needs explicit initialization and termination. (Should be language agnostic but I'm using C++ here)
class Foo
{
public:
static void InitLibrary(int someMagicInputRequiredAtRuntime);
static void TermLibrary(int someOtherInput);
};
Apparently, our library doesn't care about multi-threading, reentrancy or whatnot. Let's suppose our Init function should only be called once, calling it again with any other input would wreak havoc.
What's the best way to communicate this to my caller? I can think of two ways:
Inside InitLibrary, I assert some static variable which will blame my caller for init'ing twice.
Inside InitLibrary, I check some static variable and silently aborts if my lib has already been initialized.
Method #1 obviously is explicit, while method #2 makes it more user friendly. I am thinking that method #2 probably has the disadvantage that my caller wouldn't be aware of the fact that InitLibrary shouln't be called twice.
What would be the pros/cons of each approach? Is there a cleverer way to subvert all these?
Edit
I know that the example here is very contrived. As #daemon pointed out, I should initialized myself and not bother the caller. Practically however, there are places where I need more information to properly initialize myself (note the use of my variable name someMagicInputRequiredAtRuntime). This is not restricted to initialization/termination but other instances where the dilemma exists whether I should choose to be quote-and-quote "fault tolorent" or fail lousily.

I would definitely go for approach 1, along with an easy-to-understand exception and good documentation that explains why this fails. This will force the caller to be aware that this can happen, and the calling class can easily wrap the call in a try-catch statement if needed.
Failing silently, on the other hand, will lead your users to believe that the second call was successful (no error message, no exception) and thus they will expect that the new values are set. So when they try to do something else with Foo, they don't get the expected results. And it's darn near impossible to figure out why if they don't have access to your source code.

Serenity Prayer (modified for interfaces)
SA, grant me the assertions
to accept the things devs cannot change
the code to except the things they can,
and the conditionals to detect the difference
If the fault is in the environment, then you should try and make your code deal with it. If it is something that the developer can prevent by fixing their code, it should generate an exception.

A good approach would be to have a factory that creates an intialized library object (this would require you to wrap your library in a class). Multiple create-calls to the factory would create different objects. This way, the initialize-method would then not be a part of the public interface of the library, and the factory would manage initialization.
If there can be only one instance of the library active, make the factory check for existing instances. This would effectively make your library-object a singleton.

I would suggest that you should flag an exception if your routine cannot achieve the expected post-condition. If someone calls your init routine twice, and the system state after calling it the second time will be the same would be the same as if it had just been called once, then it is probably not necessary to throw an exception. If the system state after the second call would not match the caller's expectation, then an exception should be thrown.
In general, I think it's more helpful to think in terms of state than in terms of action. To use an analogy, an attempt to open as "write new" a file that is already open should either fail or result in a close-erase-reopen. It should not simply perform a no-op, since the program will be expecting to be writing into an empty file whose creation time matches the current time. On the other hand, trying to close a file that's already closed should generally not be considered an error, because the desire is that the file be closed.
BTW, it's often helpful to have available a "Try" version of a method that might throw an exception. It would be nice, for example, to have a Control.TryBeginInvoke available for things like update routines (if a thread-safe control property changes, the property handler would like the control to be updated if it still exists, but won't really mind if the control gets disposed; it's a little irksome not being able to avoid a first-chance exception if a control gets closed when its property is being updated).

Have a private static counter variable in your class. If it is 0 then do the logic in Init and increment the counter, If it is more than 0 then simply increment the counter. In Term do the opposite, decrement until it is 0 then do the logic.
Another way is to use a Singleton pattern, here is a sample in C++.

I guess one way to subvert this dilemma is to fulfill both camps. Ruby has the -w warning switch, it is custom for gcc users to -Wall or even -Weffc++ and Perl has taint mode. By default, these "just work," but the more careful programmer can turn on these strict settings themselves.
One example against the "always complain the slightest error" approach is HTML. Imagine how frustrated the world would be if all browsers would bark at any CSS hacks (such as drawing elements at negative coordinates).
After considering many excellent answers, I've come to this conclusion for myself: When someone sits down, my API should ideally "just work." Of course, for anyone to be involved in any domain, he needs to work at one or two level of abstractions lower than the problem he is trying to solve, which means my user must learn about my internals sooner or later. If he uses my API for long enough, he will begin to stretch the limits and too much efforts to "hide" or "encapsulate" the inner workings will only become nuisance.
I guess fault tolerance is most of the time a good thing, it's just that it's difficult to get right when the API user is stretching corner cases. I could say the best of both worlds is to provide some kind of "strict mode" so that when things don't "just work," the user can easily dissect the problem.
Of course, doing this is a lot of extra work, so I may be just talking ideals here. Practically it all comes down to the specific case and the programmer's decision.

If your language doesn't allow this error to surface statically, chances are good the error will surface only at runtime. Depending on the use of your library, this means the error won't surface until much later in development. Possibly only when shipped (again, depends on alot).
If there's no danger in silently eating an error (which isn't a real error anyway, since you catch it before anything dangerous happens), then I'd say you should silently eat it. This makes it more user friendly.
If however someMagicInputRequiredAtRuntime varies from calling to calling, I'd raise the error whenever possible, or presumably the library will not function as expected ("I init'ed the lib with value 42, but it's behaving as if I initted with 11!?").

If this Library is a static class, (a library type with no state), why not put the call to Init in the type initializer? If it is an instantiatable type, then put the call in the constructor, or in the factory method that handles instantiation.
Don;t allow public access to the Init function at all.

I think your interface is a bit too technical. No programmer want to learn what concept you have used while designing the API. Programmers want solutions for their actual problems and don't want to learn how to use an API. Nobody wants to init your API, that is something that the API should handle in the background as far as possible. Find a good abstraction that shields the developer from as much low-level technical stuff as possible. That implies, that the API should be fault tolerant.

The standard map/associative-array structure to use in flash actionscript 3?

I'm relatively new to flash, and is confused about what I should use to store and retrieve key value pairs. After some googling I've found various map-like things to choose from:
1) Use a Object:
var map:Object = new Object();
map["key"] = "value";
The problem is that it seems to lack some very basic features. For example to even get the size of map I'd have to write a util method.
2) Use a Dictionary
What does this standard library class provide over the simple object? It seems silly for it to exist if it's functionally identical to Object.
3) Go download some custom HashMap/HashTable implementation from the web.
I've used a lot of modern languages, and this is the first time I haven't been able to find a library implementation of an associative array within 5 minutes. So I'd like to get some best-practice advice from an experienced flash developer.
Thanks!

Maybe your google foo is a bit weak today?
But you're right, the built-in Object object, doesn't provide many extra features.
Dictionaries have at least two important differences with Objects:
They can use any object as a key. For Objects, the key has to be a string (if you pass any other object, the toString() method will be implicitly called).
You can optionally set they keys to be weak referenced (this doesn't make much sense for Objects).
Anyway, there are a number of opensource libraries that implement various data structures and collection types.
Just from the top of my head:
http://lab.polygonal.de/ds/
http://sibirjak.com/blog/index.php/collections/as3commons-collections/

You should use whichever option is needed for a particular situation. There is no correct answer to say "always use 'x'".
I've found that the vast majority of times Object based dictionaries are all that's needed. They're extremely fast and easy to use and I almost never need the extra features. It converts any key to a string which works well for most situations.
Dictionary provides some extra features but I've never needed object-based keys (and I've been programming in Flex since 1.0 alpha 1). The only time I've used a Dictionary is as a hack to get access to a weak reference since Flex doesn't provide a simple weak reference class.
More complex dictionaries are available which will provide more functionality. If you really need this functionality, then they will be useful, but I wouldn't recommend jumping in to use them when a plain old Object will work find for your needs. That said, if you do turn out to need them in your application, it might be best to using the same 3rd-party dictionary everywhere in that app for consistency and easy of maintenance.

Why are getters prefixed with the word "get"?

Generally speaking, creating a fluid API is something that makes all programmers happy; Both for the creators who write the interface, and the consumers who program against it. Looking beyond conventions, why is it that we prefix all our getters with the word "get". Omitting it usually results in a more fluid, easy to read set of instructions, which ultimately leads to happiness (however small or passive). Consider this very simple example. (pseudo code)
Conventional:
person = new Person("Joey")
person.getName().toLower().print()
Alternative:
person = new Person("Joey")
person.name().toLower().print()
Of course this only applies to languages where getters/setters are the norm, but is not directed at any specific language. Were these conventions developed around technical limitations (disambiguation), or simply through the pursuit of a more explicit, intentional feeling type of interface, or perhaps this is just a case of trickle a down norm. What are your thoughts? And how would simple changes to these conventions impact your happiness / daily attitudes towards your craft (however minimal).
Thanks.

Because, in languages without Properties, name() is a function. Without some more information though, it's not necessarily specific about what it's doing (or what it's going to return).
Functions/Methods are also supposed to be Verbs because they are performing some action. name() obviously doesn't fit the bill because it tells you nothing about what action it is performing.
getName() lets you know without a doubt that the method is going to return a name.
In languages with Properties, the fact that something is a Property expresses the same meaning as having get or set attached to it. It merely makes things look a little neater.

The best answer I have ever heard for using the get/set prefixes is as such:
If you didn't use them, both the accessor and mutator (getter and setter) would have the same name; thus, they would be overloaded. Generally, you should only overload a method when each implementation of the method performs a similar function (but with different inputs).
In this case, you would have two methods with the same name that peformed very different functions, and that could be confusing to users of the API.

I always appreciate consistent get/set prefixing when working with a new API and its documentation. The automatic grouping of getters and setters when all functions are listed in their alphabetical order greatly helps to distinguish between simple data access and advanced functinality.
The same is true when using intellisense/auto completion within the IDE.

What about the case where a property is named after an verb?
object.action()
Does this get the type of action to be performed, or execute the action... Adding get/set/do removes the ambiguity which is always a good thing...
object.getAction()
object.setAction(action)
object.doAction()

In school we were taught to use get to distinguish methods from data structures. I never understood why the parens wouldn't be a tipoff. I'm of the personal opinion that overuse of get/set methods can be a horrendous time waster, and it's a phase I see a lot of object oriented programmers go through soon after they start.

I may not write much Objective-C, but since I learned it I've really come to love it's conventions. The very thing you are asking about is addressed by the language.

Here's a Smalltalk answer which I like most. One has to know a few rules about Smalltalk BTW.
fields are only accessible in the they are defined.If you dont write "accessors" you won't be able to do anything with them.
The convention there is having a Variable (let's anme it instVar1.
then you write a function instVar1 which just returns instVar1 and instVar: which sets
the value.
I like this convention much more than anything else. If you see a : somewhere you can bet it's some "setter" in one or the other way.

Custom.
Plus, in C++, if you return a reference, that provides potential information leakage into the class itself.

Data mapping code or reflection code?

Getting data from a database table to an object in code has always seemed like mundane code. There are two ways I have found to do it:
have a code generator that reads a database table and creates the
class and controller to map the datafields to the class fields or
use reflection to take the database field and find it on the class.
The problems noted with the above 2 methods are as noted below
Method 1 seems to me like I'm missing something because I have to create a controller for every table.
Method 2 seems to be too labor intensive once you get into heavy data
access code.
Is there a third route that I should try to get data from a database onto my objects?

You normally use OR (Object-Relational) mappers in such situations. A good framework providing OR functionality is Hibernate. Does this answer your question?

I think the answer to this depends on the available technologies for the language you are going to use.
I for one am very successful with the use of an ORM (NHibernate) so naturally I may recommend option one.
There are other options that you may wish to take though:
If you are using .NET, you may opt to use attributes for your class properties to serve either as a mapping within a class, or as data that can be reflected
If you are using .NET, Fluent NHibernate will make it quite easy to make type-safe mappings within your code.
You can use generics so that you will not need to make a controller for every table, although I admit that it will be likely that you will do the latter anyway. However the generics can contain most of the general CRUD methods that is common to all tables, and you will only need to code specific quirks.

I use reflection to map data back and forth and it works well even under heavy data access. The "third route" is to do everything by hand, which may be faster to run but really slow to write.

I agree with lewap, an ORM (object-relational mapper) really helps in these situations. You may also want to consider the Active Record pattern (discussed in Fowler's Patterns of Enterprise Architecture book). It can really speed up creation of the DAL in simple apps.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex