Can LUIS reliably differentiate between questions and statements?
We are designing a process flow which will differ depending on whether an utterance contains a question. We can't rely on users using obvious tokens like question marks.
I could train a model with a number of utterances with question type syntax and flag those as Question intent, while those without have a None intent? Seems like this might be a fairly standard requirement and wonder if there's some existing generic solution
LUIS is intended to differentiate between utterances not only based on the question marks. However the tokens that appear iteratively in one intent and not in al others would be used to distinguish the difference based on the learnt models. If you replicate all the questions in one intent in the none intent without the question mark the only identifier that would evolve is the question mark. Which apparently is not what you want. This is also not the function of the none-intent. The question could present itself in multiple forms and what you want is to cover these forms and then add none-question type statements that are not directly related to the none-intent. This would capture tokens that are relevant like "What is", "Where is" "What if" etc... and not just the question mark
My organization is just starting to dive into Dynamics CRM and one of the questions that has come up is when should we combine various applications into one instance and when should they be separated into multiple instances?
I know the answer to that question depends on the situation, so I'm trying to come up with a list of questions that can be asked to help determine which direction makes the most sense.
I'm having a surprisingly difficult time finding any discussion of this online, so thought I'd ask here. So, what questions do you ask when deciding whether a system/set of functionality should be in a separate instance?
Edit:
I wasn't very clear about our type of organization. I work for a City with several Departments that provide different services and serve different customers with often very different functionality required.
I'm concerned about the urge to put all of these different systems that have different functionality and track different "customers" into one system. I fear there will be issues about managing all of the various entities that apply to different systems and ensuring that requests for changes from one set of users doesn't cause problems for a different set of users.
I'm sure sometimes it will make sense to combine multiple systems into one instance, but I think there may be just as many times where we don't want to put them together, so I wanted to come up with a list of questions to ask.
Some basic ones would be:
1) Do systems share common data (e.g., same customers)?
2) Do systems share common functionality?
3) Do systems collect the same kind of data?
4) Are there requirements to report on combined data from these systems?
5) Will it be easier to manage security by separating instances or through user roles?
In my experience a single instance is the norm. The benefits of a single instance are very significant in my opinion.
A few points you may want to consider:
Do you want data in silos? If so, multi-instance provides a very easy way to achieve this. However a single instance with appropriate security modelling can also achieve this.
Do you want to combine data across applications into a single business process? If so, multi-instance means you have to build an integration between instances. Single instance does not have this problem.
Do you want to use custom built features in every instance? If so, a single instance provides this straight away. Multi-instance requires separate development and deployment to every instance which may increase costs.
Have you considered licensing? I'm not a licensing expert, but I believe if you are online multi-instance will attract a higher license cost.
As a rule of thumb I would say a single instance is the default position, as it allows you to easily combine data and processes. If you want to go multi-instance just have a good reason why and be sure its not something that can be provided by a single instance.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
We using Behaviour Driven Development to develop a SOA system using Scrum and have come across two approaches to producing the stories.
Approach 1
Given Specific Message Type is available
And Specific State exists
When the Message is processed
Then expected resulting state exists
Approach 2
Given a Specific state exists
When Specific Message Type is processed
Then expected resulting state exists
Few if any of the examples available are applied to testing SOA systems. I would appreciate any experiences of these or any insights on the consequences of each approach.
We are aiming for declarative rather than imperative stories. The message arrival in the first has a slightly imperative feel but I'm not confident the second approach covers acceptance criteria adequately, because it doesn't seem to account for the event driven nature of the SUT.
The aim of the story is to communicate with your customer, so whatever style promotes that goal is best - and that will vary from one team to another. I might prefer 'when some business event occurs' rather than your suggestions, but I don't know your team! Beware of trying to find a 'one-size-fits-all' template, use whatever communicates best for each situation. And the heart of agile is the ability to adapt - try one style and feel free to adapt if it doesn't seem to be working.
Whenever I'm producing a library or service, I often find it useful to provide an example of the kind of scenario which a service user might want. So for instance:
Given the server has information about risk limits for Lieumoney Brothers
And we are $2m from those limits
When we process EOD orders that take us to $1m for those limits
Then our status with Lieumoney Brothers should be Amber.
The actual contents of the message can then reflect the interaction with those particular sums and that particular counterparty. You can use this for lots of different domains, and your approach will depend on the domain and whether the availability of a message is unusual, for that domain. In the above example where you're trading millions then having risk information for a particular counterparty might be valuable, and if that's the state, it's worth calling out separately. It's probably less important if you're buying baby rabbits, for example.
Given "Rotweiller Pets Limited" is trading baby rabbits $2 cheaper than anyone else
When we ask the system to order 15 baby rabbits
Then it should place an order with "Rotweiller Pets Limited".
It's hard to discuss this without specific examples. However, you can probably see how providing those scenarios would then act as documentation for how to use your APIs, even if the underlying automation for those scenarios talks directly to the API, and has nothing actually specific for pets or Lieumoney trades.
I'm working on a project right now where I have been slowly accumulating a bunch of different variables from a bunch of different sources. Being a somewhat clever person, I created a different sub-directory for each under a main "original_data" directory, and included a .txt file with the URL and other descriptors of where I got the data from. Being an insufficiently clever person, these .txt files have no structure.
Now I am faced with the task of compiling a methods section which documents all the different data sources. I am willing to go through and add structure to the data, but then I would need to find or build a reporting tool to scan through the directories and extract the information.
This seems like something that ProjectTemplate would have already, but I can't seem to find that functionality there.
Does such a tool exist?
If it does not, what considerations should be taken into account to provide maximum flexibility? Some preliminary thoughts:
A markup language should be used (YAML?)
All sub-directories should be scanned
To facilitate (2), a standard extension for a dataset descriptor should be used
Critically, to make this most useful there needs to be some way to match variable descriptors with the name that they ultimately take on. Therefore either all renaming of variables has to be done in the source files rather than in a cleaning step (less than ideal), some code-parsing has to be done by the documentation engine to track variable name changes (ugh!), or some simpler hybrid such as allowing the variable renames to be specified in the markup file should be used.
Ideally the report would be templated as well (e.g. "We pulled the [var] variable from [dset] dataset on [date]."), and possibly linked to Sweave.
The tool should be flexible enough to not be overly burdensome. This means that minimal documentation would simply be a dataset name.
This is a very good question: people should be very concerned about all of the sequences of data collection, aggregation, transformation, etc., that form the basis for statistical results. Unfortunately, this is not widely practiced.
Before addressing your questions, I want to emphasize that this appears quite related to the general aim of managing data provenance. I might as well give you a Google link to read more. :) There are a bunch of resources that you'll find, such as the surveys, software tools (e.g. some listed in the Wikipedia entry), various research projects (e.g. the Provenance Challenge), and more.
That's a conceptual start, now to address practical issues:
I'm working on a project right now where I have been slowly accumulating a bunch of different variables from a bunch of different sources. Being a somewhat clever person, I created a different sub-directory for each under a main "original_data" directory, and included a .txt file with the URL and other descriptors of where I got the data from. Being an insufficiently clever person, these .txt files have no structure.
Welcome to everyone's nightmare. :)
Now I am faced with the task of compiling a methods section which documents all the different data sources. I am willing to go through and add structure to the data, but then I would need to find or build a reporting tool to scan through the directories and extract the information.
No problem. list.files(...,recursive = TRUE) might become a good friend; see also listDirectory() in R.utils.
It's worth noting that filling in a methods section on data sources is a narrow application within data provenance. In fact, it's rather unfortunate that the CRAN Task View on Reproducible Research focuses only on documentation. The aims of data provenance are, in my experience, a subset of reproducible research, and documentation of data manipulation and results are a subset of data provenance. Thus, this task view is still in its infancy regarding reproducible research. It might be useful for your aims, but you'll eventually outgrow it. :)
Does such a tool exist?
Yes. What are such tools? Mon dieu... it is very application-centric in general. Within R, I think that these tools are not given much attention (* see below). That's rather unfortunate - either I'm missing something, or else the R community is missing something that we should be using.
For the basic process that you've described, I typically use JSON (see this answer and this answer for comments on what I'm up to). For much of my work, I represent this as a "data flow model" (that term can be ambiguous, by the way, especially in the context of computing, but I mean it from a statistical analyses perspective). In many cases, this flow is described via JSON, so it is not hard to extract the sequence from JSON to address how particular results arose.
For more complex or regulated projects, JSON is not enough, and I use databases to define how data was collected, transformed, etc. For regulated projects, the database may have lots of authentication, logging, and more in it, to ensure that data provenance is well documented. I suspect that that kind of DB is well beyond your interest, so let's move on...
1. A markup language should be used (YAML?)
Frankly, whatever you need to describe your data flow will be adequate. Most of the time, I find it adequate to have good JSON, good data directory layouts, and good sequencing of scripts.
2. All sub-directories should be scanned
Done: listDirectory()
3. To facilitate (2), a standard extension for a dataset descriptor should be used
Trivial: ".json". ;-) Or ".SecretSauce" works, too.
4. Critically, to make this most useful there needs to be some way to match variable descriptors with the name that they ultimately take on. Therefore either all renaming of variables has to be done in the source files rather than in a cleaning step (less than ideal), some code-parsing has to be done by the documentation engine to track variable name changes (ugh!), or some simpler hybrid such as allowing the variable renames to be specified in the markup file should be used.
As stated, this doesn't quite make sense. Suppose that I take var1 and var2, and create var3 and var4. Perhaps var4 is just a mapping of var2 to its quantiles and var3 is the observation-wise maximum of var1 and var2; or I might create var4 from var2 by truncating extreme values. If I do so, do I retain the name of var2? On the other hand, if you're referring to simply matching "long names" with "simple names" (i.e. text descriptors to R variables), then this is something only you can do. If you have very structured data, it's not hard to create a list of text names matching variable names; alternatively, you could create tokens upon which string substitution could be performed. I don't think it's hard to create a CSV (or, better yet, JSON ;-)) that matches variable name to descriptor. Simply keep checking that all variables have matching descriptor strings, and stop once that's done.
5. Ideally the report would be templated as well (e.g. "We pulled the [var] variable from [dset] dataset on [date]."), and possibly linked to Sweave.
That's where others' suggestions of roxygen and roxygen2 can apply.
6. The tool should be flexible enough to not be overly burdensome. This means that minimal documentation would simply be a dataset name.
Hmm, I'm stumped here. :)
(*) By the way, if you want one FOSS project that relates to this, check out Taverna. It has been integrated with R as documented in several places. This may be overkill for your needs at this time, but it's worth investigating as an example of a decently mature workflow system.
Note 1: Because I frequently use bigmemory for large data sets, I have to name the columns of each matrix. These are stored in a descriptor file for each binary file. That process encourages the creation of descriptors matching variable names (and matrices) to descriptors. If you store your data in a database or other external files supporting random access and multiple R/W access (e.g. memory mapped files, HDF5 files, anything but .rdat files), you will likely find that adding descriptors becomes second nature.
My dbml file is just getting bigger and bigger and more unwieldy:
I favoured an all-in-one approach as supposed to multiple data contexts because when I tried that it was near impossible to manage in code. I was advised it was better to have them all in one chart and the difficulty will be simply in managing this chart and not in code.
The chart I've got is becoming a pain to manage, if I want to even remove a table and re-add it it sometimes takes a little while to manually find it! There isn't even a list I can find in VS2010 of the objects you have in that chart!
Is there a better way of doing this?
Generally speaking, group tables related to the same concept in the same diagram, and create multiple diagrams, yes that means you have to MANAGE each diagram; but generally this is a GOOD thing. Here's why... Same database schema, different diagrams. Each diagram representing a specific subset of business. So product catalog section, an order section, a biling section, a returns section, a sales section.. etc.. Just make sure it groups up to a specific line of business. Yes this does mean that tables will be repeated on different diagrams
By segmenting the table structure into business logic groupings, you can quickly see all related tables to that groping. This is helpful to developers as they have to work in those specific sections; they understand the scope of work with out having to understand the entire database structure. When making a change, if you find a table is on multiple groupings/diagrams you can see what areas of the business are impacted by the change. This gives you an idea of the areas of the application which need to be tested and at a minimum considered when your making a change to the database structures. Ideally this type of modeling would be implemented in relation to services offered in a Service Management style of architecture. However starting to group your tables into business processes would help. IF you think this is unwieldy... try looking at an oracle db that has over 1500 tables in its schema.
The overall trick here is to only show those tables/views related to the business process/service someone would NEED to look at to support the system.
Good luck!