Should I test all enum values in a contract? - pact

I have a doubt about about whether I should consider a certain type of test functional or contract.
Let's say I have an API like /getToolType, that accepts a {object" "myObject"} as input, and returns at type in the form {type: "[a-z]+"}
It was agreed between client and server that the types returned will match a set of strings, let's say [hammer|knife|screwdriver], so the consumer decided to parse them in an enum, with a fallback value when the returned type is unknown.
Should the consumer include a test case for each type(hammer, knife, screwdriver) to ensure the producer is still following the agreement that it will always return , for instance , the lowercase string "hammer" when /getToolType is called with an hammer object?
Or would you consider such a test case as functional? And why?

IMO the short answer is 'no'.
Contract testing is more interested in structure, if we start boundary testing the API we move into functional test territory, which is best done in the provider code base. You can use a matcher to ensure only one of those three values is returned, this should ensure the Provider build can't return other values.
I would echo #J_A_X's comments - there is no right or wrong answer, just be wary of testing all permutations of input/output data.

Great question. Short answer: there's no right or wrong way, just how you want to do it.
Longer answer:
The point of Pact (and contract testing) is to test specific scenarios and making sure that they match up. You could simply, in your contract, create a regex that allows any string type for those enums, or maybe null, but only if your consumer simply doesn't care about that value. For instance, if the tool type had a brand, I wouldn't care about the brand, just that it's returned back as a string since I just display the brand verbatim on the consumer (front-end).
However, if it was up to me, from what I understand of your scenario, it seems like the tool type is actually pretty important considering the endpoint it's hitting, hence I would probably have specific tests and contracts for each enum to make sure that those particular scenarios on my consumer are valid (I call X with something and I expect Y to have tool type Z).
Both of these solutions are valid, what it comes down to is this: Do you think the specific tool type is important to the consumer? If it is, create contracts specific to it, if not, then just create a generic contract.
Hope that helps.

The proper state is that consumer consumes hammer, knife, and screwdriver, c=(hammer,knife,screwdriver) for short while producer produces hammer, knife, and screwdriver, p=(hammer,knife,screwdriver).
There are four regression scenarios:
c=(hammer,knife,screwdriver,sword), p=(hammer,knife,screwdriver)
c=(hammer,knife,screwdriver), p=(hammer,knife,screwdriver,sword)
c=(hammer,knife,screwdriver), p=(hammer,knife)
c=(hammer,knife), p=(hammer,knife,screwdriver)
1 and 3 break the contract in a very soft way.
In the 1st scenario, the customer declared a new type that is not (yet) supported by the producer.
In the 3rd scenario, the producer stops supporting a type.
The gravity of scenarios may of course wary, as something I consider soft regression, might be in a certain service in a business-critical process.
However, if it is critical then there is a significant motivation to cover it with a dedicated test case.
2nd and 4th scenarios are more severe, in both cases, the consumer may end up in an error, e.g. might be not able to deserialize the data.
Having a test case for each type should detect scenario 3 and 4.
In the 1st scenario, it may trigger the developer to create an extra test case that will fail on the producer site.
However, the test cases are helpless against the 2nd scenario.
So despite the relatively high cost, this strategy does not provide us with full test coverage.
Having one test case with a regex covering all valid types (i.e. hammer|knife|screwdriver) should be a strong trigger for the consumer developer to redesign the test case in 1st and 4th scenario.
Once the regex is adjusted to new consumer capabilities it can detect scenario 4 with probability p=1/3 (i.e. the test will fail if the producer selected screwdriver as sample value).
Even without regex adjustment, it will detect the 3rd scenario with p=1/3.
This strategy is helpless against the 1st and 2nd scenario.
However, on top of the regex, we can do more.
Namely, we can design the producer test case with random data.
Assuming that the type in question is defined as follows:
enum Tool {hammer,knife,screwdriver}
we can render the test data with:
responseBody = Arranger.some(Tool.class);
This piece of code uses test-arranger, but there are other libraries that can do the same as well.
It selects one of the valid enum values.
Each time it can be a different one.
What does it change?
Now we can detect the 2nd scenario and after regex adjustment the 4th one.
So it covers the most severe scenarios.
There is also a drawback to consider.
The producer test is nondeterministic, depending on the drawn value it can either succeed or fail which is considered to be an antipattern.
When some tests sometimes fail despite the tested code being correct, people start to ignore the results of the tests.
Please note that producer test case with random data is not the case, it is in fact the opposite.
It can sometimes succeed despite the tested code is not correct.
It still is far from perfect, but it is an interesting tradeoff as it is the first strategy that managed to address the very severe 2nd scenario.
My recommendation is to use the producer test case with random data supported with a regex on the customer side.
Nonetheless, there is no perfect solution, and you should always consider what is important for your services.
Specifically, if the consumer can safely ignore unknown values, the recommended approach might be not a perfect fit.

Related

Case Sensitivity on Decide Shapes in BizTalk 2013R2

Diagnosing an issue with a BizTalk app where part of it's logic doesn't seem to be triggering.
Currently it's designed to use a Decision Shape to filter on 2 values from a specific message.
One of those values is the word 'staff' in lower case, whereas the map that constructs the message uses a string functoid to populate the value as 'Staff' (sentence case).
I'd test this to see if it's the cause, but we don't currently have a test environment, and there's about 8 apps that this is a dependency on, so I'd need to go through a convoluted process of taking them all offline and deploying the small fix at a gamble.
On that basis, would anyone know; Is BizTalk Decide Shape Expression logic case sensitive?
Yes, the decide shape is case sensitive.
I tested with a rule
Message(FILE.ReceivedFileName) == "D:\\in\\YES.xml"
I dropped through a files YES.xml, yes.xml and YES.XML, and only the YES.xml went through the Rule branch, the other files went through the Else.
This is probably due to C# being case sensitive, see Is there a C# case insensitive equals operator?

Asterisk pre-emption and callers in a channel

I would like to have pre-emption calls in Asterisk. I think there is no Asterisk support for this feature so i'm trying to implement it following a simliar algorithm like the one showed in this thread: Asterisk - Pre-emption calls
So I'm having problems in this step:
check if B in call with lower priority caller( ASTDB or REALTIME or fastagi script).
I know how to check if B is in a call using for example DEVICE_STATE(device) cmd, but i can't achieve to know who is the other caller in order to see his priority.
So, How can I know if one users is in a call and who is the other caller inside this call?
Thanks a lot.
You can read variables of any channel using
SHARED(varname[,channel])
-= Info about function 'SHARED' =-
[Synopsis]
Gets or sets the shared variable specified.
[Description]
Implements a shared variable area, in which you may share variables between
channels.
The variables used in this space are separate from the general namespace
of the channel and thus ${SHARED(foo)} and ${foo} represent two completely
different variables, despite sharing the same name.
Finally, realize that there is an inherent race between channels operating
at the same time, fiddling with each others' internal variables, which is
why this special variable namespace exists; it is to remind you that variables
in the SHARED namespace may change at any time, without warning. You should
therefore take special care to ensure that when using the SHARED namespace,
you retrieve the variable and store it in a regular channel variable before
using it in a set of calculations (or you might be surprised by the
result).
Sure you have set variables first.
You can set in variables or in ASTDB name of current speaking channel using in-call macro
General complexity of any solution like you want is above average, need person with at least 1-2 year of extensive experience with *.

Modeling an HTTP transition system in Alloy

I want to model an HTTP interaction, i.e. a sequence of HTTPRequest/HTTPResponse, and I am trying to model this as a transition system.
I defined an ordering on a class State by using:
open util/ordering[State]
where a State is simply a set of Messages:
sig State {
msgSet: set Message
}
Each pair of (HTTPRequest->HTTPResponse) and (HTTPResponse->HTTPRequest) is represented as a rule in my transition system.
The rules are expressed in Alloy as predicates that let one move from one state to another.
E.g., this is a rule generating an HTTPResponse after a particular HTTPRequest is received:
pred rsp1 [s, s': State] {
one msg: Request, msg':Response | (
// Preconditions (previous Request)
msg.method=get &&
msg.address.url=sample_com &&
// Postconditions (next Response)
msg'.status=OK_200 &&
// previous Request has to be in previous state
msg in s.msgSet &&
// Response generated is added to next state
s'.msgSet = s.msgSet + msg'
}
Unfortunately, the model created seems to be too complex: we have a dozen of rules (more complex than the one above but following the same pattern) and the execution is very slow.
EDIT: In particular, the CNF generation is extremely slow, while the solving takes a reasonable amount of time.
Do you have any suggestion on how to model a similar transition system?
Thank you very much!
This is a model with an impressive level of detail; thank you for sharing it!
None of the various forms of honestAction by itself takes more than two or three minutes to find an instance (or in some cases to fail to find any instance), except for rsp8, which takes quite a while by itself (it ran for fifteen minutes or so before I stopped it).
So the long CNF preparation times you are observing are apparently caused by either (a) just predicate rsp8 that's causing your time issues, or (b) the size of the disjunction in the honestAction predicate, or (c) both.
I suspect but have not proved that the time issue is caused by combinatorial explosion in the number of individuals required to populate a model and the number of constraints in the model.
My first instinct (it's not more than that) would be to cut back on the level of detail in the model, in particular the large number of singleton signatures which instantiate your abstract signatures. These seem (I could be wrong) to be present either for bookkeeping purposes (so you can identify which rule licenses the transition from one state to another), or because the modeler doesn't trust Alloy to generate concrete instances of signatures like UserName, Password, Code, etc.
As the model now is, it looks as if you're doing a lot of work to define all the individuals involved in a particular example, instead of defining constraints and letting Alloy do the work of finding examples. (Using Alloy to check the properties a particular concrete example can be useful, but there are other ways to do that.)
Since so many of the concrete signatures in the model are constrained to singleton cardinality, I don't actually know that defining them makes the task of finding models more complex; for all I know, it makes it simpler. But my instinct is to think that it would be more useful to know (as well as possibly easier for Alloy to establish) that state transitions have a particular property in general, no matter what hosts, users, and URIs are involved, than to know that property rsp1 applies in all the cases where the host is named examplecom and the address URI is example_url_https and whatnot.
I conjecture that reducing the number of individuals whose existence and properties are prescribed, and the constraints on which individuals can be involved in which state transitions, will reduce the CNF generation time.
If your long-term goal is to test long sequences of state transitions to test whether from a given starting point it's possible or impossible to arrive at a particular state (or kind of state), you may need to re-think the approach to enable shorter sequences of state transitions to do the job.
A second conjecture would involve less restructuring of the model. For reasons I don't think I understand fully, sometimes quantification with one seems to hurt rather than help performance, as in this example, where explicitly quantifying some variables with some instead of one turned out to make a problem tractable instead of intractable.
That question involves quantification in a predicate, not in the model overall, and the quantification with one wasn't intended in the first place, so it may not be relevant here. But we can test the effect of the one keyword on this model in a simple way: I commented out everything in honestAction except rsp8 and ran the predicate first != last in a scope of 8, once with most of the occurrences of one commented out and once with those keywords intact. With the one keywords commented out, the Analyser ran the problem in 24 seconds or so; with the one keywords in place, it ran for 500 seconds so far before I decided the point was made and terminated it.
So I'd try removing the keyword one from all of the signatures with instance-specific individuals, leaving it only on get, post, OK_200, etc., and appData. I would also try doing without the various subtypes of Key, SessionID, URL, Host, UserName, and Password, or at least constraining their cardinality in the run command.

A peer-to-peer and privacy-aware data mining/aggregation algorithm: is it possible?

Suppose I have a network of N nodes, each with a unique identity (e.g. public key) communicating with a central-server-less protocol (e.g. DHT, Kad). Each node stores a variable V. With reference to e-voting as an easy example, that variable could be the name of a candidate.
Now I want to execute an "aggregation" function on all V variables available in the network. With reference to e-voting example, I want to count votes.
My question is completely theoretical (I have to prove a statement, details at the end of the question), so please don't focus on the e-voting and all of its security aspects. Do I have to say it again? Don't answer me that "a node may have any number identities by generating more keys", "IPs can be traced back" etc. because that's another matter.
Let's see the distributed aggregation only from the privacy point of view.
THE question
Is it possible, in a general case, for a node to compute a function of variables stored at other nodes without getting their value associated to the node's identity? Did researchers design such a privacy-aware distributed algorithm?
I'm only dealing with privacy aspects, not general security!
Current thoughts
My current answer is no, so I say that a central server, obtaining all Vs and processes them without storing, is necessary and there are more legal than technical means to assure that no individual node's data is either stored or retransmitted by the central server. I'm asking to prove that my previous statement is false :)
In the e-voting example, I think it's impossible to count how many people voted for Alice and Bob without asking all the nodes, one by one "Hey, who do you vote for?"
Real case
I'm doing research in the Personal Data Store field. Suppose you store your call log in the PDS and somebody wants to find statistical values about the phone calls (i.e. mean duration, number of calls per day, variance, st-dev) without being revealed neither aggregated nor punctual data about an individual (that is, nobody must know neither whom do I call, nor my own mean call duration).
If a trusted broker exists, and everybody trusts it, that node can expose a double getMeanCallDuration() API that first invokes CallRecord[] getCalls() on every PDS in the network and then operates statistics on all rows. Without the central trusted broker, each PDS exposing double getMyMeanCallDuration() isn't statistically usable (the mean of the means shouldn't be the mean of all...) and most importantly reveals the identity of the single user.
Yes, it is possible. There is work that actually answers your question solving the problem, given some assumptions. Check the following paper: Privacy, efficiency & fault tolerance in aggregate computations on massive star networks.
You can do some computation (for example summing) of a group of nodes at another node without having the participants nodes to reveal any data between themselves and not even the node that is computing. After the computation, everyone learns the result (but no one learns any individual data besides their own which they knew already anyways). The paper describes the protocol and proves its security (and the protocol itself gives you the privacy level I just described).
As for protecting the identity of the nodes to unlink their value from their identity, that would be another problem. You could use anonymous credentials (check this: https://idemix.wordpress.com/2009/08/18/quick-intro-to-credentials/) or something alike to show that you are who you are without revealing your identity (in a distributed scenario).
The catch of this protocol is that you need a semi-trusted node to do the computation. A fully distributed protocol (for example, in a P2P network scenario) is not that easy though. Not because of a lack of a storage (you can have a DHT, for example) but rather you need to replace that trusted or semi-trusted node by the network, and that is when you find your issues, who does it? Why that one and not another one? And what if there is a collusion? Etc...
How about when each node publishes two sets of data x and y, such that
x - y = v
Assuming that I can emit x and y independently, you can correctly compute the overall mean and sum, while every single message is largely worthless.
So for the voting example and candidates X, Y, Z, I might have one identity publishing the vote
+2 -1 +3
and my second identity publishes the vote:
-2 +2 -3
But of course you cannot verify that I didn't vote multiple times anymore.

Having two sets of input combined on hadoop

I have a rather simple hadoop question which I'll try to present with an example
say you have a list of strings and a large file and you want each mapper to process a piece of the file and one of the strings in a grep like program.
how are you supposed to do that? I am under the impression that the number of mappers is a result of the inputSplits produced. I could run subsequent jobs, one for each string, but it seems kinda... messy?
edit: I am not actually trying to build a grep map reduce version. I used it as an example of having 2 different inputs to a mapper. Let's just say that I lists A and B and would like for a mapper to work on 1 element from list A and 1 element from list B
So given that the problem experiences no data dependency that would result in the need for chaining jobs, is my only option to somehow share all of list A on all mappers and then input 1 element of list B to each mapper?
What I am trying to do is built some type of a prefixed look-up structure for my data. So I have a giant text and a set of strings. This process has a strong memory bottleneck, therefore I was after 1 chunk of text/1 string per mapper
Mappers should be able to work independent and w/o side effects. The parallelism can be, that a mapper tries to match a line with all patterns. Each input is only processed once!
Otherwise you could multiply each input line with the number of patterns. Process each line with a single pattern. And run the reducer afterwards. A ChainMapper is the solution of choice here. But remember: A line will appear twice, if it matches two patterns. Is that what you want?
In my opinion you should prefer the first scenario: Each mapper processes a line independently and checks it against all known patterns.
Hint: You can distribute the patterns with the DistributedCache feature to all mappers! ;-) Input should be splitted with the InputLineFormat
a good friend had a great epiphany: what about chaining 2 mappers?
in the main, run a job that fires up a mapper (no reducer). The input is the list of strings, and we can arrange things so that each mapper gets one string only.
in turn, the first mapper starts a new job, where the input is the text. It can communicate the string by setting a variable in the context.
Regarding your edit:
In general a mapper is not used to process 2 elements at once. He shall only process one element a time. The job should be designed in a way, that there could be a mapper for each input record and it would still run correctly!
Of course it is suitable, that the mapper needs some supporting information to process the input. This information can be by-passed with the Job Configuration (Configuration.setString() for example). A larger set of data shall be passed via the distributed cache.
Did you have a look on one of these options?
I'm not sure if I fully understood your problem, so please check by yourself if that would work ;-)
BTW: A appreciating vote for my well investigated previous answer would be nice ;-)

Resources