Identify the ObjectFactory method used to create the object - reflection

Given an object o which was or could have been created with a JAXB ObjectFactory, what's the best way to find the method in that ObjectFactory which would be used to create the object?
My objective is to be able to generate Java code sufficient to recreate that object (ie one or more createXYZ statements).
Does the answer change if I commit to a specific JAXB implementation, say MOXy, for example?
Suppose I just know the object is from some JAXBContext (so one of several ObjectFactory classes could have been used to create it). Does this change the answer at all?
Where the object is a JAXBElement, #XmlElementDecl comes in to play. #XmlElementDecl can have a scope. My JAXB objects know their parent, so hopefully this matches scope.
I've written some proof of concept code which uses getGenericReturnType and getAnnotation(XmlElementDecl.class), to find the method, but I'm guessing there is likely to be stuff in one of the JAXB implementations, which could be re-used to do this more effectively/elegantly.

Related

spaCy adding pointer to another token in custom component

I am trying to find how token.head and token.children are implemented. I want to replicate this implementation as I add a custom component to my spaCy pipeline for SRL.
That is, each token can point to predicates for which it is an argument. Intuitively, I think that this should work kind of like token.children wherein (I think) it returns a generator of the actually dependent child token objects.
I assume that I should not simply store an attribute of that token as this does not seem very memory efficient and rather redundant. Does anyone know the correct way to implement this? Or is this handled implicitly by the spaCy Underscore.set method?
Thanks!
The Token object is only a view -- it's sort of like holding a reference to the Doc object, and an index to the token. The Span object is like this too. This ensures there's a single source of truth, and only one copy of the data.
You can find the definition of the key structs in the spacy/structs.pxd file. This defines the attributes of the TokenC struct. The Doc object then holds an array of these, and a length. The Token objects are created on the fly when you index into the Doc. The data definition for the Doc object can be found in spacy/tokens/doc.pxd, and the implementation of the token access is in spacy/tokens/doc.pyx.
The way the parse tree is encoded in spaCy is a bit unsatisfying. I've made an issue about this on the tracker --- it feels like there should be a better solution.
What we do is encode the offset of the head relative to the token. So if you do &doc.c[i] + doc.c[i].head you'll get a pointer to the head. That part is okay. The part that's a bit weirder is that we track the left and right edges of the token's subtree, and the number of direct left and right children. To get the rightmost or leftmost child, we navigate around within this region. In practice this actually works pretty well because we're dealing with a contiguous block of memory, and loops in Cython are fast. But it still feels a bit janky.
As far as what you'll be able to do as a user...If you run your own fork of spaCy you can happily define your own data on the structs. But then you're running your own fork.
There's no way to attach "real" attributes to the Doc or Token objects, as these are defined as C-level types --- so their structure is defined statically; it's not dynamic. You could subclass the Doc but this is quite ugly: you need to also subclass.
This is why we have the underscore attributes, and the doc.user_data dictionary. It's really the only way to extend the objects. Fortunately you shouldn't really face a data redundancy problem. Nothing is stored on the Token objects. The definitions of your extensions are stored globally, within the Underscore class. Data is stored on the Doc object, even if it applies to a token --- again, the Token is a view. It can't own anything. So the Doc has to note that we have some value assigned to token i.
If you're defining a tree-navigation system, I'd recommend considering defining it as your own Cython class, so you can use structs. If you use native Python types it'll be pretty slow and pretty large. If you pack the data into numpy arrays the representation will be more compact, but writing the code will be a pretty miserable experience, and the performance is likely to be not great.
In short:
Define your own types in Cython. Put the data into a struct owned by a cdef class, and give the class accessor methods.
Use the underscore attributes to access the data from spaCy's Doc, Span and Token objects.
If you come up with a compelling API for SRL and the data can be coded compactly into the TokenC struct, we'd consider adding it as native support.

What's best practice in this situation?

I was just writing a small asp.net web page to display a collection of objects by binding to a repeater, when this came to mind.
Basically the class I've created, let's call it 'Test', has a price property that's an integer data type (ignore the limitations of using this type, I'm just using it as an example). However I want to format this property so it displays a currency and the correct decimal places etc.
Is it best practice to have a function within the class that returns the formatted string for the object, or would it be better to have a function in the back end of my web form that operations on the object and returns the formatted string?
I've heard before that a class should contain all it's relative functions but I've also heard that presentation should be kept in the 'presentation layer' in my N-tier app.
What would be the best approach in my situation? (and apologies if I haven't explained this clearly enough!)
Thanks!
In my opinion, both options are valid from an OO point of view.
Since the value is a price (that just happens to have the wrong data type), it makes sense to put the formatting into the data class. It's not something that's specific to the web interface, and, if you develop a different kind of user interface, you are very likely to require this formatting again.
On the other hand, it's a presentation issue, so it also makes sense to put it into the presentation layer.
For general OOP stuff, the object should not be exposing implementation details. I choose to interpret this as "avoid setters and getters when possible".
In the context of your question, I suggest that you have a getPriceDisplay() method that returns a string containing the formatted price.
The actual implementation of the formatting is hidden in the implementation details. You could provide a generic function for formatting, use some backend call, or something else. Those details should make no difference to the consumer of the 'Test' object.
Though it's not an OOP approach, in my opinion, this is a good time for an extension method. Call it .ToCurrency() which has the format of the currency...this could be taken from the Web.Config file if you wanted.
Edit
To elaborate, I would simply call .ToString("your-format") (of course this could be as simple as .ToString("C") for your specific question) in the extension method. This allows you change the format throughout the UI in one place. I have found this to be very useful when dealing with DateTime formats in web applications.
Wouldn't .ToString("C"); do the job? This would be in the presentation layer I would imagine.

Why create new classes in R?

I know that you can create new classes in R, but why would you want to? I've thought of two reasons:
You can use the is. function to test whether an object belongs to a particular class (classifications of objects)
To only allow certain classes of entries into slots of an object (e.g., only a string for the surnmane and only a number for a zip code in the person class).
I haven't thought of situations where these benefits couldn't be achieved fairly easily by other means or when they'd really be useful.
I hope that this isn't too open ended and more concrete examples how one might use defining classes would be great. Thanks for any thoughts.
Its called Object-Oriented programming. Look it up, but in short:
Objects encapsulate behaviour - eg the behaviour of the 'print' method for a class is specific to that class. You can then keep the code for that method on that class separate from other code. You then only have to tell your users to "print" the thing - which is something they already do - and they get a nicely custom printed version of your thing, without having to use a special print function, like "printMyThing(thing)".
Objects inherit behaviour from their parent classes - eg the 'formula' method for the glm class falls back to the formula method for the lm class (not sure if this is true, but its just for illustration.
In short, its a Good Thing.

Determining type of CollectionBase via Reflections (or Microsoft.Cci)

Question:
Is there a static way to reliably determine the type contained by a type derived from CollectionBase, using Reflection or Microsoft.Cci?
Background:
I am working on a code generator that copies types, makes customized versions of those types, and converters between. It walks the types in the source assembly via Microsoft.Cci. It prints out source code using textual templates. It does a lot of conversion and customization, and tosses out code that I don't care about.
In my resulting code, I intend to replace List<T> everywhere that a CollectionBase, IEnumerable<T>, or T[] was previously used. I want to use List<T> because I am pretty sure I can serialize it without extra work, which is important for my application. T is concrete in every case. I am trying not to copy CollectionBase classes because I'd have to copy over the custom implementation, and I'd like to avoid having to do that in my code generator.
The only part I'm having a problem with is determining T for List<T> when replacing a custom CollectionBase.
What I've done so far:
I have briefly looked at the MSDN docs and samples for CollectionBase, and they mention creating a custom Add method on your derived type. I don't think this is in any way enforced, so I'm not sure I can rely on that. An implementor could name it something else, or worse, have a collection that supports multiple types, with Object as their only common ancestor.
Alternatives I have considered:
Maybe the default serialization does some tricks that I can take advantage of. Is there a default serialization for CollectionBase collections, or do you generally have to implement it yourself? If you have to do it yourself, is there some reliable metadata I could look at in order to determine the types? If it supports default serialization, does it rely on the runtime types of the items in the collection?
I could make a mapping in my code generator of known CollectionBase types, mapped to their corresponding T for List<T>. If a given CollectionBase type that I encounter isn't in the list, throw an exception. This is probably what I'll go with if I there isn't a reliable alternative.
I'm still not sure enough about what you want to do to give advice. Still, do your CollectionBase-derived classes all implement Add(T)? If so, you could look for an Add method with single parameter of type other than object, and use that type for T.

ASP.NET. Is it better to pass an entire entity to a method or pass each property of this entity as parameters?

We're developing a business ASP.NET application. Is it better to pass an entire entity to a method or pass each property of this entity as parameters? What is the best practice?
Case 1. Pass Customer entity to a manager - InsertCustomer(Customer cust)
Case 2. Pass each property as a parameter - InsertCustomer(string name, string address...etc)
P.S. We're using Entity Framework as our data access layer
Pass the entire entity, not only for reasons given in the other answers, but generally methods with long parameter chains are bad. They are prone to error, and tough to work with from a development standpoint (just look at Interop with Office)
In general, if I see I am getting too many parameters (usually more than three), either I have a method trying to do too much, or I explore ways of encapsulating this data in a struct.
You should pass the entire entity as when you update the entity, e.g. add or remove members you do not have to update all your method calls in all your layers. You only need to change your datalayer and the layer where you are consuming the entity. asp.net is Object Oriented and therefore you should orientate your code around your objects
The whole concept of object orientation requires objects to be passed around. If all is happening internally I would go with this.
If this is being posted to a webservice / across a network etc you would need to serialize, and hence may find it better to pass each individual parameter, especially if the receiving framework is different.
Don't forget your Strings etc are all objects too.
I agree with another poster, passing a whole entity "encapsulates" everything so that it can be updated/modified so you have less to worry about.

Resources