How to check the efficiency of UIMA Annotators? - information-extraction

I have made a few annotators in UIMA and now, i want to check their efficiency.Is there a standardized way to gauge the performance of the Annotators?

UIMA itself does not provide immediate support for comparing annotators and evaluating them against a gold standard.
However, there are various tools/implementations out there that provide such functionality on top of UIMA but typically within the confines of the particular tool, e.g.:
U-Compare supports running multiple annotators doing the same thing and comparing their results
WebAnno is an interactive annotation tool that uses UIMA as its backend and that supports comparing annotations from multiple users to each other. There is a class called "CasDiff2" in the code that generates differences and feeds them into DKPro Statistics in the background for the actual agreement calculation. Unfortunately, CasDiff2 cannot be really used separately from WebAnno (yet).
Disclosure: I'm on the WebAnno team and have implemented CasDiff2 in there.

Related

Advantage of the components apart from the three main ones

What is the advantage of using the components from OpenMDAO's standard library
(i.e. matrixvectorproduct, dotproduct, linearsystem, etc)?
As far as I understand, all of them are based on the two base classes: ExplicitComponent and ImplicitComponent
Is there a reason one should use them apart from convenience?
The OpenMDAO standard library of components provides a set of helful, general use components, all vectorized, and all with verified-to-be-correct analytic derivatives. You're certainly not required or even obligated to use them at all. However, these components are ones that appear again and again inside many different models that have been built.
Their common appearance motivated us to generalize their implementations and include them in the standard library to avoid the need to either re-implement the components each time, or copy/paste the existing implementation into a new project.
Duplicating code in general is a bad idea, so whenever you can abstract something to be more general and broadly useable is a good idea.
If you are smart about how you leverage these components, you can implement some very complex calculations without the need to write the nonlinear or linear code yourself. The Dymos version 0.10.0 and OpenConcept libraries, built on top of OpenMDAO, use these components extensively to reduce their own coding burdens.

Amazon Alexa dynamic variables for intent

I am trying to build an Alexa Skills Kit, where a user can invoke an intent by saying something like
GetFriendLocation where is {Friend}
and for Alexa to recognize the variable friend I have to define all the possible values in LIST_OF_Friends file. But what if I do not know all the values for Friend and still would like to make a best match for ones present in some service that my app has access to.
Supposedly if you stick a small dictionary into a slot (you can put up to 50,000 samples), it becomes a "generic" slot and becomes very open to choosing anything, rather than what is given to it. In practice, I haven't had much luck with this.
It is a maxim in the field of Text To Speech that the more restrictive the vocabulary, the greater the accuracy. And, conversely, the greater the vocabulary, the lower the accuracy.
A system like VoiceXML (used mostly for telephone prompt software) has a very strict vocabulary, and generally performs well for the domains it has been tailored for.
A system like Watson TTS is completely open, but makes up for it's lack of accuracy by returning a confidence level for several different interpretations of the sounds. In short, it offloads much of the NLP work to you.
Amazon have, very deliberately, chosen a middle road for Alexa. Their intention model allows for more flexibility than VoiceXML, but is not as liberal as a dictation system. The result gives you pretty good options and pretty good quality.
Because of their decisions, they have a voice model where you have to declare, in advance, everything it can recognize. If you do so, you get consistent and good quality recognition. There are ways, as others have said, to "trick" it into supporting a "generic slot". However, by doing so, you are going outside their design and consistency and quality suffer.
As far as I know, I don't think you can dynamically add utterances for intents.
But for your specific question, there is a builtin slot call AMAZON.US_FIRST_NAME, which may be helpful.

Using Rule Flow in InRule for Workflow

I see Rule Flow which supports action so it may be possible to build some types of workflow on top of this. In my situation I have an case management application with tasks for different roles, all working on a "document" that flows through different states and depending on state, different role will see in their queue to work on.
I'm not sure what your question is, but InRule comes with direct support for Windows Workflow Foundation, so executing any InRule RuleApplication, including those with RuleFlow definitions, is certainly possible.
If you'd like assistance setting up this integration, I would suggest utilizing the support knowledge base and forums at http://support.inrule.com
Full disclosure: I am an InRule Technology employee.
For case management scenarios, you can use decisions specifically to model a process. Create a custom table or flags in your cases that depict the transition points in your process (steps). As you transition steps, call a decision which will determine if the data state is good enough to make the transition. If it is, then set the flag for the new state. Some folks allow for multiple states at the same time. InRule is a stateless platform; however, when used with CRM it provides 95% of the process logic and relies on CRM to do the persistence. I have written about this pattern in a white paper:
https://info.inrule.com/rs/inruletechnology/images/White_Paper_InRule_Salesforce_Integration.pdf

Which tincan verbs to use

For data normalisation of standard tin can verbs, is it best to use verbs from the tincan registry https://registry.tincanapi.com/#home/verbs e.g.
completed http://activitystrea.ms/schema/1.0/complete
or to use the adl verbs like those defined:
in the 1.0 spec at https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md
this article http://tincanapi.com/2013/06/20/deep-dive-verb/
and listed at https://github.com/RusticiSoftware/tin-can-verbs/tree/master/verbs
e.g.
completed http://adlnet.gov/expapi/verbs/completed
I'm confused as to why those in the registry differ from every other example I can find. Is one of these out of date?
It really depends on which "profile" you want to target with your Statements. If you are trying to stick to e-learning practices that most closely resemble SCORM or some other standard then the ADL verbs may be most fitting. It is a very limited set, and really only the "voided" verb is provided for by the specification. The other verbs were related to those found in 0.9 and have become the de facto set, but aren't any more "standard" than any other URI. If you are targeting statements to be used in an Activity Streams way, specifically with a social application then you may want to stick with their set. Note that there are verbs in the Registry that are neither ADL coined or provided by the Activity Streams specification.
If you aren't targeting any specific profile (or existing profile) then you should use the terms that best capture the experiences which you are trying to record. And we ask that you either coin those terms at our Registry so that they are well formed and publicly available, or if you coin them under a different domain then at least get them catalogued in our Registry so others may find them. Registering a particular term in one or more registries will hopefully help keep the list of terms from exploding as people search for reusable items. This will ultimately make reporting tools more interoperable with different content providers.

Modelling / documenting functional programs

I've found UML useful for documenting various aspects of OO systems, particularly class diagrams for overall architecture and sequence diagrams to illustrate particular routines. I'd like to do the same kind of thing for my clojure applications. I'm not currently interested in Model Driven Development, simply on communicating how applications work.
Is UML a common / reasonable approach to modelling functional programming? Is there a better alternative to UML for FP?
the "many functions on a single data structure" approach of idiomatic Clojure code waters down the typical "this uses that" UML diagram because many of the functions end up pointing at map/reduce/filter.
I get the impression that because Clojure is a somewhat more data centric language a way of visualizing the flow of data could help more than a way of visualizing control flow when you take lazy evaluation into account. It would be really useful to get a "pipe line" diagram of the functions that build sequences.
map and reduce etc would turn these into trees
Most functional programmers prefer types to diagrams. (I mean types very broadly speaking, to include such things as Caml "module types", SML "signatures", and PLT Scheme "units".) To communicate how a large application works, I suggest three things:
Give the type of each module. Since you are using Clojure you may want to check out the "Units" language invented by Matthew Flatt and Matthias Felleisen. The idea is to document the types and the operations that the module depends on and that the module provides.
Give the import dependencies of the interfaces. Here a diagram can be useful; in many cases you can create a diagram automatically using dot. This has the advantage that the diagram always accurately reflects the code.
For some systems you may want to talk about important dependencies of implementations. But usually not—the point of separating interfaces from implementations is that the implementations can be understood only in terms of the interfaces they depend on.
There was recently a related question on architectural thinking in functional languages.
It's an interesting question (I've upvoted it), I expect you'll get at least as many opinions as you do responses. Here's my contribution:
What do you want to represent on your diagrams? In OO one answer to that question might be, considering class diagrams, state (or attributes if you prefer) and methods. So, obviously I would suggest, class diagrams are not the right thing to start from since functions have no state and, generally, implement one function (aka method). Do any of the other UML diagrams provide a better starting point for your thinking? The answer is probably yes but you need to consider what you want to show and find that starting point yourself.
Once you've written a (sub-)system in a functional language, then you have a (UML) component to represent on the standard sorts of diagram, but perhaps that is too high-level, too abstract, for you.
When I write functional programs, which is not a lot I admit, I tend to document functions as I would document mathematical functions (I work in scientific computing, lots of maths knocking around so this is quite natural for me). For each function I write:
an ID;
sometimes, a description;
a specification of the domain;
a specification of the co-domain;
a statement of the rule, ie the operation that the function performs;
sometimes I write post-conditions too though these are usually adequately specified by the co-domain and rule.
I use LaTeX for this, it's good for mathematical notation, but any other reasonably flexible text or word processor would do. As for diagrams, no not so much. But that's probably a reflection of the primitive state of the design of the systems I program functionally. Most of my computing is done on arrays of floating-point numbers, so most of my functions are very easy to compose ad-hoc and the structuring of a system is very loose. I imagine a diagram which showed functions as nodes and inputs/outputs as edges between nodes -- in my case there would be edges between each pair of nodes in most cases. I'm not sure drawing such a diagram would help me at all.
I seem to be coming down on the side of telling you no, UML is not a reasonable way of modelling functional systems. Whether it's common SO will tell us.
This is something I've been trying to experiment with also, and after a few years of programming in Ruby I was used to class/object modeling. In the end I think the types of designs I create for Clojure libraries are actually pretty similar to what I would do for a large C program.
Start by doing an outline of the domain model. List the main pieces of data being moved around the primary functions being performed on this data. I write these in my notebook and a lot of the time it will be just a name with 3-5 bullet points underneath it. This outline will probably be a good approximation of your initial namespaces, and it should point out some of the key high level interfaces.
If it seems pretty straight forward then I'll create empty functions for the high level interface, and just start filling them in. Typically each high level function will require a couple support functions, and as you build up the whole interface you will find opportunities for sharing more code, so you refactor as you go.
If it seems like a more difficult problem then I'll start diagramming out the structure of the data and the flow of key functions. Often times the diagram and conceptual model that makes the most sense will depend on the type of abstractions you choose to use in a specific design. For example if you use a dataflow library for a Swing GUI then using a dependency graph would make sense, but if you are writing a server to processing relational database queries then you might want to diagram pools of agents and pipelines for processing tuples. I think these kinds of models and diagrams are also much more descriptive in terms of conveying to another developer how a program is architected. They show more of the functional connectivity between aspects of your system, rather than the pretty non-specific information conveyed by something like UML.

Resources