I have to solve a CSP logic problem using Java Constraints Library. For now I've managed to represent some constraints of the problem, most of them are based on "equals" and "not equals" binary constraints. My doubt is, how to represent an addition based constraint? Example:
variable1 belongs to DomainA
variable2 belongs to DomainB
variable3 belongs to DomainA
variable4 belongs to DomainB
Now the constraint:
The sum of variable1 and variable2 is
greater than the sum of variable3 and
variable4.
Observation: these variables represent money, so they can be added.
Since Java Constraint Library uses only unary or binary constraints, we have to do Binarization of Constraints in order to represent n-ary constraints. We can also inherit existing relations classes in the library and define new compatible relations.
EDIT: as of 2020, the JCL library link is dead, here's the original paper for that library: https://www.aaai.org/Papers/Workshops/1997/WS-97-05/WS97-05-004.pdf
Related
Suppose I have two finite posets (e.g. constructed with sage.combinat.posets.posets.FinitePoset).
I want to calculate the binary relation which is the composition of the order relations of these posets.
How to do this in Sage?
(I am a Sage novice.)
Not yet, apparently. See Trac 24542 for a general future implementation of binary relations (which is what you'd likely need, since this sort of composition of posets probably usually threatens not to be a poset?).
how can I use decision tree graph to determine the significant variables,I know which one has largest information gain should be in the root of tree which means has small entropy so this is my graph if I want to know which variables are significant how can I interpret
What does significant mean to you? At each node, the variable selected it the most significant given the context and assuming that selecting by information gain will actually work (it's not always the case). For example, at node 11, BB is the most significant discriminator given AA>20.
Clearly, AA and BB are the most useful assuming selecting by information gain gives the best way to partition the data. The rest give further refinement. C and N would be next.
What you should be asking is: Should I keep all the nodes?
The answer depends on many things and there is likely no best answer.
One way would be by using the total case count of each leaf and merge them.
Not sure how I would do this given your image. It's not really clear what is being shown at the leaves and what 'n' is. Also not sure what 'p' is.
Given a data structure for sets, testing two sets for equality seems to be a desirable task, and indeed many implementations allow this (e.g. builtin sets in python).
There are different set implements in Erlang: sets, ordsets, gb_sets. Their documentation does not indicate, whether it is possible to test equality using term comparison ("=="), nor do they provide explicit functions for testing equality.
Some naive cases seem to allow equality testing with "==", but I have a larger application where I'm able to produce sets and gb_sets which are equal (tested with the function below) but do not compare equal with "==". For ordsets, they always compare equal. Unfortunately I haven't found a way to produce a minimal example for cases where equal sets do not compare equal with "==".
For reliably testing equality I use the following function, based on this theorem on set equality:
%% #doc Compare two sets for equality.
-spec sets_equal(sets:set(), sets:set()) -> boolean().
sets_equal(Set1, Set2) ->
sets:is_subset(Set1, Set2) andalso sets:is_subset(Set2, Set1).
My questions:
Is their a rationale, why Erlang set implementations do not offer explicit equality testing?
How to explain the difference when testing set equality with "==" with for the different set implementations?
How can produce a minimal example of sets where "==" does not compare equal but the sets are equal given the above code?
Some thoughts on question 2:
The documentation for sets states, that "The representation of a set is not defined." where as the documentation of ordsets states, that "An ordset is a representation of a set". The documentation on gb_sets does not give any comparable indication.
The following comment, from the source code of the sets implementation, seems to reiterate the statement from the documentation :
Note that as the order of the keys is undefined we may freely reorder keys within in a bucket.
My interpretation is, that term comparison with "==" in Erlang works on the representation of the sets, i.e. two sets only compare equal if their representation is identical. This would explain the different behavior of the different set implementations but also reinforces the question, why there is no explicit equality comparison.
ordsets are implemented as a sorted list, and the implementation is fairly open and meant to be visible. They are going to compare equal (==), although == means that 1.0 is equal to 1. They won't compare as strictly equal (=:=).
sets are implemented as a form of hash table, and its internal representation does not lend itself to any form of direct comparison; as hash collisions happen, the last element added is prepended to the list for the given hash entry. This prepend operation is sensitive to the order in which the elements are added.
gb_sets are implemented as a general balancing tree, and the structure of a tree does depend on the order in which the elements were inserted and when rebalancing took place. They are not safe to compare directly.
To compare two sets of the same type together, an easy way is to call Mod:is_subset(A,B) andalso Mod:is_subset(B,A) -- two sets can only be subsets of each other when they're equal.
Say I have an set of string:
x=c("a1","b2","c3","d4")
If I have a set of rules that must be met:
if "a1" and "b2" are together in group, then "c3" cannot be in that group.
if "d4" and "a1" are together in a group, then "b2" cannot be in that group.
I was wondering what sort of efficient algorithm are suitable for generating all combinations that meet those rules? What research or papers or anything talk about these type of constrained combination generation problems?
In the above problem, assume its combn(x,3)
I don't know anything about R, so I'll just address the theoretical aspect of this question.
First, the constraints are really boolean predicates of the form "a1 ^ b2 -> ¬c3" and so on. That means that all valid combinations can be represented by one binary decision diagram, which can be created by taking each of the constraints and ANDing them together. In theory you might make an exponentially large BDD that way (that usually doesn't happen, but depends on the structure of the problem), but that would mean that you can't really list all combinations anyway, so it's probably not too bad.
For example the BDD generated for those two constraints would be (I think - not tested - just to give an idea)
But since this is really about a family of sets, a ZDD probably works even better. The difference, roughly, between a BDD and a ZDD is that a BDD compresses nodes that have equal sub-trees (in the total tree of all possibilities), while the ZDD compresses nodes where the solid edge (ie "set this variable to 1") goes to False. Both re-use equal sub-trees and thus form a DAG.
The ZDD of the example would be (again not tested)
I find ZDDs a bit easier to manipulate in code, because any time a variable can be set, it will appear in the ZDD. In contrast, in a BDD, "skipped" nodes have to be detected, including "between the last node and the leaf", so for a BDD you have to keep track of your universe. For a ZDD, most operations are independent of the universe (except complement, which is rarely needed in the family-of-sets scenario). A downside is that you have to be aware of the universe when constructing the constraints, because they have to contain "don't care" paths for all the variables not mentioned in the constraint.
You can find more information about both BDDs and ZDDs in The Art of Computer Programming volume 4A, chapter 7.1.4, there is an old version available for free here.
These methods are in particular nice to represent large numbers of such combinations, and to manipulate them somehow before generating all the possibilities. So this will also work when there are many items and many constraints (such that the final count of combinations is not too large), (usually) without creating intermediate results of exponential size.
I'm not great with statistical mathematics, etc. I've been wondering, if I use the following:
import uuid
unique_str = str(uuid.uuid4())
double_str = ''.join([str(uuid.uuid4()), str(uuid.uuid4())])
Is double_str string squared as unique as unique_str or just some amount more unique? Also, is there any negative implication in doing something like this (like some birthday problem situation, etc)? This may sound ignorant, but I simply would not know as my math spans algebra 2 at best.
The uuid4 function returns a UUID created from 16 random bytes and it is extremely unlikely to produce a collision, to the point at which you probably shouldn't even worry about it.
If for some reason uuid4 does produce a duplicate it is far more likely to be a programming error such as a failure to correctly initialize the random number generator than genuine bad luck. In which case the approach you are using it will not make it any better - an incorrectly initialized random number generator can still produce duplicates even with your approach.
If you use the default implementation random.seed(None) you can see in the source that only 16 bytes of randomness are used to initialize the random number generator, so this is an a issue you would have to solve first. Also, if the OS doesn't provide a source of randomness the system time will be used which is not very random at all.
But ignoring these practical issues, you are basically along the right lines. To use a mathematical approach we first have to define what you mean by "uniqueness". I think a reasonable definition is the number of ids you need to generate before the probability of generating a duplicate exceeds some probability p. An approcimate formula for this is:
where d is 2**(16*8) for a single randomly generated uuid and 2**(16*2*8) with your suggested approach. The square root in the formula is indeed due to the Birthday Paradox. But if you work it out you can see that if you square the range of values d while keeping p constant then you also square n.
Since uuid4 is based off a pseudo-random number generator, calling it twice is not going to square the amount of "uniqueness" (and may not even add any uniqueness at all).
See also When should I use uuid.uuid1() vs. uuid.uuid4() in python?
It depends on the random number generator, but it's almost squared uniqueness.