Defining multiplicity of attribute with constraint - constraints

I have 2 classes in UML and now need to create a constraint for this part - attribute1:String is in class1, and attribute2:int is in class2, connection between classes is generalization - can be changed to association.
I need to write this somehow
if attribute1 contains 'First year',
then attribute2 have multiplicity [1..2],
else if attribute1 contains 'Second year',
then attribute2 have multiplicity [3..4], and so on.
I know all values that attribute1 can take defined as enumeration(12 values, but only 4 if conditions is needed because every 3 have same part of text at begin).
I am creating UML in enterprise architect if its important.
Here is the picture of classes
or

There are several ways to model that
Put a constraint on association
This is a simplest and most obvious solution. Put a constraint describing logic in a note symbol in curly brackets {} and link it to the association. The constraint can have any form, e.g. natural language or formal language like OCL
Note that in such case your multiplicity on a constraint will range from 1 to the achievable maximum for all possible values of each enumeration value.
The drawback is that the information is purely textual and might be difficult to comprehend.
Create subclasses (solution earlier suggested by Jim L. in his answer)
Subclass can redefine an attribute, e.g. changing the multiplicity. On a parent level the class would have a multiplicity with a maximum achievable maximum while each "year specific" subclass will have a multiplicity matching requirements for that year.
You also need to anyway model a constraint for each of subclasses defining which enumeration values are available for that particular subclass.
The drawback of this solution is that when you have a possibility to change from one year to another, it'll not be a simple attribute change but rather a whole replacement of one subtyped object into another one with a different subclass as a type.
Multiplicity as variables
The idea is that you handle a logic of mapping between the possible values and related multiplicities and on association you represent the multiplicity using attributes of that mapping rather than specific numbers.
This approach builds a bunch of possible detailed solution but I group them together as they all follow the same approach with just slight differences on how to handle the multiplicity values. I'll give just one detailed solution example (more to follow if someone asks)
One of solutions here is to use data type rather than enumeration. The data type will have in its structure a name (which can still be using the enumeration as a base) and two values (lower and upper multiplicity values). Then your attribute1 will be of that data type and your multiplicities will reference the attribute1 and it's specific properties.
Your date type might for example contain properties name, minM and maxM and then on attribute you'll have a multiplicity minM..maxM.
Of course you need to add constrains ensuring that {0<=minM<=maxM} on data type and it's good to specify a set of possible values for a data type somewhere in a documentation as e.g. a table.
A drawback of this solution is that the relationship between specific value and its multiplicity restrictions is not directly on a diagram. Yet this is balanced by a much stronger flexibility of a solution.
Multiplicity as a formula
If there is a simple logic between e.g. year and number of related elements that can be written as a formula such formula can also be used in a multiplicity. This is especially useful if you split your enumeration into two separate numeric attributes (hey, you've got a class, when choosing you can still use enumeration, just map it inside the class!). I'll make this assumption in my example.
Let's say that instead of attribute1 you have two attributes: yearNo and yearType. Moreover lets say that in first year you can have 1 or 2 objects of class2 in your class 1 object, in second year you have 3 or 4, and so on. In general you have in n-th year 2n-1 to 2n elements so your multiplicity will be 2*yearType-1..2*yearType
A drawback of this approach is that it is possible only if there is a formula behind.
Additional remarks:
I believe the mentioned in solution 4 split of your attribute1 is a good solution regardless which solution you choose.
Generalization has no multiplicity. This type of relationship shows that object of a subclass is at the same type an object of a supertype. In my opinion you should not use this type of relationship here. Most probably you were thinking of shared/composite aggregation rather than generalization (but it has a different arrow head - a diamond, not triangle. Of course it can be quite safely replaced with association.
Don't use association to the Enumeration (in general to a data type). If you put an attribute as a textual attribute in a class, link it with a type through a dependency (a dashed line with an open arrow). For enumeration (data type in general) this is the only relationship. For normal classes as an attribute type you can exchange inline attribute with an (graphically shown) association (with a role name to make it fully replaceable).

I think you need to replace the enumeration with explicit subclasses under Class2. Each class can then redefine multiplicity for attribute2. While you could express this in OCL about the enumeration, few people would understand it.

Related

How to depict index of elements of ordered collection in UML diagrams?

I want to depict the index of elements of an ordered collection in an UML object diagram.
The only information I was able to find in the UML Spec 2.5.1 was in the part about semantics of associations 11.5.3.1.
When one or more ends of the Association are ordered, links carry ordering information in addition to their end values.
But either there is no guidance regarding the notation of such ordering information or I just didn't find it. I think I have seen a colon followed by the index in some tools. I wonder if there is a consensus or reference about how to depict indices on the links?
EDIT:
Although the existing answer is already holistic, let me add some clarification and context. As the first sentence already stated, I want to use this explicit information in an object diagram (maybe the parenthesis were confusing, I removed them). The object diagrams are used as part of test case sepcifications to communicate the object structure of the input, expected result and actual result. To that regard, the order of objects in a collection may play a role, e.g., imagine a test case specification for the correct implementation of the specification of a sorting algorithm.
I did not specify the kind of collection on purpose as I do not see how that would influence the answer as long as the collection is ordered. Typically, a sequence/list would come to my mind.
I do not need OCL in this case but I appreciate the answer taking that into consideration as formulating constraints on the order of collection elements is closely related.
UML
There is nothing foreseen for representing the index of an ordered collection in UML. In section 7.5.3.2 it is defined that ordering makes sense in relation to elements with multiplicity:
If the MultiplicityElement is specified as ordered (i.e., isOrdered is true), then the collection of values in an instantiation of this Element is ordered. This ordering implies that there is a mapping from positive integers to the elements of the collection of values. If a MultiplicityElement is not multivalued, then the value for isOrdered has no semantic effect.
The positive integer that is mapped correspond to what you'd call an index. But nothing is defined in the UML specs: Not even if the index should start at 0, at 1 or at any arbitrary value. It's not even said that the indexes have to be consecutive.
The UML specs explain in the same section, that the semantics of the ordered collections depend also on the unicity of their elements:
isOrdered isUnique Collection Type
false true Set
true true OrderedSet
false false Bag
true false Sequence
Unfortunately, the OrderedSet and Sequence are not defined in the UML specifications.
The only case where the ordering is defined more precisely is for properties defined as derived unions (section 9.5.3):
then the ordering of the union is defined by evaluating the subsetting properties in the order in which they appear in the result of allAttributes() and concatenating the results.
Conclusion: ¨There is no way to define what the order is (e.g. link the order to some properties), and nothing is foreseen to refer to the indexes in the ordering.
OCL
The OCL language is a companion of UML. it is used to write more formal and precise constraints. It defines some more semantics for collections:
The OrderedSet is a Set, the elements of which are ordered. It contains no duplicates. OrderedSet is itself an instance of the metatype OrderedSetType.
An OrderedSet is not a subtype of Set, neither a subtype of Sequence. The common supertype of Sets and OrderedSets is Collection.
A sequence is a collection where the elements are ordered. An element may be part of a sequence more than once. Sequence is itself an instance of the metatype SequenceType.
Sequence is not a subtype of Bag. The common supertype of Sequence and Bag is Collection.
OCL uses the notion of index in several operations available for Sequence and OrderedSet:
The expression at(i) identifies the i-the element
The expression indexOf(v) returns the index of the element v
The expression first() returns the first element, being understood that its index is 1
The expression last() reutrns the last element, being udnerstood that its index correspond to the size of the collection.
These expressions are related to ordered collections and are not defined for unordered collections such as sets and bags.
Conclusion you can use indexes in UML constraints by using OCL and even relate them with the help of constraints an order to properties
Edit: More about object diagrams
Object diagram represents instances of objects. The association lines between these objects hence representthe “links” mentioned in your quote.
While a notation exists to specify values of object properties, nothing is defined for the links:
Pragmatically, you could just number the end of the link (there should be no confusion with multiplicity since it’s a link and not an association). If you fear some confusion, you may prefix the number with an informal # or Nr. .
Alternatively, if you have to stay 100% compliant, you may put the order information in a note symbol.
Reading section 9.8.4 page 126, and considering the use of = to specifiy values within instances, I think that it could be argued that order=1 would be valid, since the only value that is not already defined for the link through the instances at both ends are the ordering information.

Gremlin Vertex id vs label: advantages?

I am designing a graph, and see examples where several vertices will have a similar label, such as 'user', etc. When knowing its unique value, one can assign it to the vertex' id, and look it up as:
g.V('person').has('id','unique-value'). ...
Or assign that unique value as a label, and reference it that way.
g.V('unique-value'). ...
Is there a particular reason not use unique values (an id, essentially) as a label, such as performance? What is the best strategy for this?
Your question and your Gremlin examples don't quite align. I think that you mean to compare:
g.V().hasLabel('person').has(T.id,'unique-value')
and
g.V('unique-value')
Note my corrections in that first Gremlin statement. V() does not take a vertex label as an argument - it can only take a vertex id or a Vertex object. Also, the actual vertex identifier must be referenced by T.id and not 'id', the latter being a reference to a user-defined property named "id". T.id is what you get returned from g.V().id(). We often refer to T.id as just id and I will do so going forward.
With that being straightened out, there is no need to do hasLabel('person') if you have the id handy, so the two examples above return the same value and I would think that most graph databases would likely optimize away the label filter and just use the id for their lookup so I wouldn't imagine that you'd see a difference in performance, but for readability purposes I'd stick to just using V('unique-value').
Your question specifically asked about using a unique label as a way to identify a vertex, so I will also address that. A label is not meant for unique identification of a graph element. It is meant to categorize groups of elements. Aside from that convention, I think there are a number of technical reasons not to do that. Some graphs have limits on the number of labels you can have so that could be a problem depending on your graph provider. At the very least, you reduce the portability of your code by doing that. I think it would impact performance as label lookups are not going to be as fast as id lookups (especially as you scale the graph up in size).

Aggregation and navigability at the same end

In UML, is it possible to draw an aggregation where the component object can access the composite object? Like in this image, but with only one association line, so the association end touching A would have a diamond and an arrow.
If that isn't possible, the diagram I drawn is valid? If not, why?
Another point of view, navigability is important to show how is it possible to navigate in the model and how to access to instances.
Another point is about OCL, if navigability is not defined some OCL queries will be hard to write.
Specification describes (p 198): Navigability means that instances participating in links at runtime (instances of an Association) can be accessed efficiently from instances at the other ends of the Association. The precise mechanism by which such efficient access is achieved is implementation specific. If an end is not navigable, access from the other ends may or may not be possible, and if it is, it might not be efficient.
And about Property class (p 149): The query isNavigable() indicates whether it is possible to navigate across the property. body: not classifier->isEmpty() or association.navigableOwnedEnd->includes(self).
So to model or not navigability is important.
If you want to have navigability in both side, the following image shows that:
But in section 6 of the specifiation, it is written:
An association with neither end marked by navigability arrows means that the association is navigable in both directions.
Arrow notation is used to denote association end navigability. By definition, all class-owned association ends are navigable. By convention, all association owned ends in the metamodel are not navigable.
So the following schema is the same than the above one. This is tricky but it seems true.
Of course that's possible.
If you want to save space, you can use a single line for the association:
Here's my personal opinion about navigability: The navigational arrow is not needed as the existence of the property owner implies that already.
P. 110 of the specs:
When a Property is owned by a Classifier other than an Association via ownedAttribute, then it represents an attribute of the Classifier.
P. 200:
Navigability notation was often used in the past according to an informal convention, whereby non-navigable ends were assumed to be owned by the Association whereas navigable ends were assumed to be owned by the Classifier at the opposite end. This convention is now deprecated. Aggregation type, navigability, and end ownership are separate concepts, each with their own explicit notation. Association ends owned by classes are always navigable, while those owned by associations may be navigable or not.
But what is an association that is just named? It's a useless construct so far. What you intend is to finally create attributes in either classes - by means of properties in which case you add this (new) dot. To me this is simply over-constructed and impractical. Who is really using those dots? In EA you have to open sub-menus to make them appear. For me (and probably most UML readers) a role name represents a property and this an attribute in the other side. That's just "(my) human logic" behind that proposition. So, my practical approach:
Use simple connectors for association
Eventually add (non-) navigability (crosses and) arrows
Later add role names to indicate the use of attributes at the other end (rather than adding a typed attribute to the class' list).
And that's it. Just forget that silly dot that's a) hard to produce (in EA) and b) even harder to recognize.
Once again: this last part here is my recommendation for practical modeling.

Eclipse Constraint Programming - search/6

I'm having trouble understanding this documentation for the search/6 function in the eclipse constraint programming framework.
I understand that the choice parameter basically affects the value ordering.
It also seems like the selection method chooses the variable ordering, but I don't entirely understand all the options for it.
I don't really understand the other parameters so I was wondering if someone could explain them in words. I have a pretty good understanding of the theory of constraint logic programming so feel free to refer to those concepts. I just don't understand a lot of the CS lingo in that documentation (arity, etc.)
Thank you
I'll try to answer it as briefly as possible, since search/6 is one of the most complex predicates you can find in the ECLiPSe system.
Any more detailed follow-up questions would probably better be asked in the ECLiPSe user mailing list, though.
The search/6 predicate is a generic predicate for controlling the search for a solution of a CLP problem. It allows the user to control the shape of the search tree (the order of variables along the branches, the order of the branches, and the portion of the search tree that is visited). The predicate has 6 parameters: search(+L, ++Arg, ++Select, +Choice, ++Method, +Option). (+ and ++ denote the mode of the parameter)
The first two parameters go together. L is either a list of variables or a list of terms. If it's the former, the Arg must be 0, if it's the latter, then Arg denotes the position of the variables that should be instantiated during the search, e.g.:
search([A,B],0,input_order,indomain,complete,[]).
or
search([p(1,A),p(2,B)],2,input_order,indomain,complete,[]).
In both cases, the variables A and B are instantiated during search.
The third parameter is the selection method. This method is used by search/6 to select the next variable from the list L to instantiate.
The simplest option is input_order: the search simply iterates of the variables in the list. In the examples above, it would instantiate A first, then B. The other options consider the domain size and/or the number of constraints attached to the variables, and make the selection accordingly. E.g., first_fail chooses the variable with the smallest domain. If the current domain of A is [1,2,3] and B has the domain [1,3], then B will be selected and instantiated first. If more than one variable has the same smallest domain size, then the first of these by input order will be selected. Selection methods that take the domain size into account achieve a dynamic variable ordering, since the domain sizes will change (shrink) during search, depending on the amount of propagation that the constraints achieve.
The other selection methods should now be self-explanatory.
It is also possible to define your own selection method, provided that the predicate that implements it has arity 2, i.e., has two parameters. The predicate must take a variable as input and calculate some criterion value. The variable with the smallest criterion value will be selected.
The fourth parameter is the choice method. Once a variable is selected, the choice method controls the order in which the values in its domain are tried during search.
The simplest option is indomain, which chooses the values in the variable's current domain in ascending order. I.e., if variable A has the domain [1,3,5], then the search will initially bind A to 1, on backtracking bind it to 3, and finally to 5. indomain_middle will start with 3, then 1, then 5.
The more complex choice methods (i.e., other than indomain) will remove a tried value on backtracking, i.e., basically add additional constraints like A#\=1. This will cause additional propagation which may in turn allow earlier detection of infeasibilities. You can see the effect when running the n-queens example from the search/6 documentation that you linked to in your question.
Again, it is also possible to define your own choice method. The predicate must be of arity 1 or 3. If the arity is 1, then the predicate takes one variable as input and binds it to a value (or makes some other choice which alters the domain of the variable). If the arity is 3, then you can use the two additional parameters to pass along some state information which you can use to make the choice.
The fifth parameter is the search method. This controls the size of the section of the search tree that the search should explore (whereas the selection method controls the order of the variables along the branches of the tree, and the choice method controls the order of the branches in the search tree).
The simplest option is complete, which searches the tree left-to-right until the tree is exhausted. All other options (apart from symmetry breaking) are incomplete search methods, i.e., there will be branches in the search tree that are left unexplored. If the solution is on the leaf of such an unexplored branch, then it will not be found. You have to make sure that selection and choice methods shape the search tree in a way that the incomplete search method is able to find the solution. The option bbs, for instance, restricts the number of backtracks that can be made during search. If that number is exhausted, then the search will stop.
Symmetry breaking will only exclude branches that are equivalent (symmetrical) to other branches, in some way.
The sixth parameter is a list of possible additional options, described in the search/6 documentation. Normally, you won't need them.

What is the difference between a map and a dictionary?

I know a map is a data structure that maps keys to values. Isn't a dictionary the same? What is the difference between a map and a dictionary1?
1. I am not asking for how they are defined in language X or Y (which seems to be what generally people are asking here on SO), I want to know what is their difference in theory.
Two terms for the same thing:
"Map" is used by Java, C++
"Dictionary" is used by .Net, Python
"Associative array" is used by PHP
"Map" is the correct mathematical term, but it is avoided because it has a separate meaning in functional programming.
Some languages use still other terms ("Object" in Javascript, "Hash" in Ruby, "Table" in Lua), but those all have separate meanings in programming too, so I'd avoid them.
See here for more info.
One is an older term for the other. Typically the term "dictionary" was used before the mathematical term "map" took hold. Also, dictionaries tend to have a key type of string, but that's not 100% true everywhere.
Summary of Computer Science terminology:
a dictionary is a data structure representing a set of elements, with insertion, deletion, and tests for membership; the elements may be, but are not necessarily, composed of distinct key and value parts
a map is an associative data structure able to store a set of keys, each associated with one (or sometimes more than one - e.g. C++ multimap) value, with the ability to access and erase existing entries given only the key.
Discussion
Answering this question is complicated by programmers having seen the terms given more specific meanings in particular languages or systems they've used, but the question asks for a language agnostic comparison "in theory", which I'm taking to mean in Computing Science terms.
The terminology explained
The Oxford University Dictionary of Computer Science lists:
dictionary any data structure representing a set of elements that can support the insertion and deletion of elements as well as test for membership
For example, we have a set of elements { A, B, C, D... } that we've been able to insert and could start deleting, and we're able to query "is C present?".
The Computing Science notion of map though is based on the mathematical linguistic term mapping, which the Oxford Dictionary defines as:
mapping An operation that associates each element of a given set (the domain) with one or more elements of a second set (the range).
As such, a map data structure provides a way to go from elements of a given set - known as "keys" in the map, to one or more elements in the second set - known as the associated "value(s)".
The "...or more elements in the second set" aspect can be supported by an implementation is two distinct way:
Many map implementations enforce uniqueness of the keys and only allow each key to be associated with one value, but that value might be able to be a data structure itself containing many values of a simpler data type, e.g. { {1,{"one", "ichi"}, {2, {"two", "ni"}} } illustrates values consisting of pairs/sets of strings.
Other map implementations allow duplicate keys each mapping to the same or different values - which functionally satisfies the "associates...each [key] element...with...more [than one] [value] elements" case. For example, { {1, "one"}, {1, "ichi"}, {2, "two"}, {2, "ni"} }.
Dictionary and map contrasted
So, using the strict Comp Sci terminology above, a dictionary is only a map if the interface happens to support additional operations not required of every dictionary:
the ability to store elements with distinct key and value components
the ability to retrieve and erase the value(s) given only the key
A trivial twist:
a map interface might not directly support a test of whether a {key,value} pair is in the container, which is pedantically a requirement of a dictionary where the elements happen to be {key,value} pairs; a map might not even have a function to test for a key, but at worst you can see if an attempted value-retrieval-by-key succeeds or fails, then if you care you can check if you retrieved an expected value.
Communicate unambiguously to your audience
⚠ Despite all the above, if you use dictionary in the strict Computing Science meaning explained above, don't expect your audience to follow you initially, or be impressed when you share and defend the terminology. The other answers to this question (and their upvotes) show how likely it is that "dictionary" will be synonymous with "map" in the experience of most programmers. Try to pick terminology that will be more widely and unambiguously understood: e.g.
associative container: any container storing key/value pairs with value-retrieval and erasure by key
hash map: a hash table implementation of an associative container
hash set enforcing unique keys: a hash table implementation of a dictionary storing element/values without treating them as containing distinct key/value components, wherein duplicates of the elements can not be inserted
balance binary tree map supporting duplicate keys: ...
Crossreferencing Comp Sci terminology with specific implementations
C++ Standard Library
maps: map, multimap, unordered_map, unordered_multimap
other dictionaries: set, multiset, unordered_set, unordered_multiset
note: with iterators or std::find you can erase an element and test for membership in array, vector, list, deque etc, but the container interfaces don't directly support that because finding an element is spectacularly inefficient at O(N), in some cases insert/erase is inefficient, and supporting those operations undermines the deliberately limited API the container implies - e.g. deques should only support erase/pop at the front and back and not in terms of some key. Having to do more work in code to orchestrate the search gently encourages the programmer to switch to a container data structure with more efficient searching.
...may add other languages later / feel free to edit in...
My 2 cents.
Dictionary is an abstract class in Java whereas Map is an interface. Since, Java does not support multiple inheritances, if a class extends Dictionary, it cannot extend any other class.
Therefore, the Map interface was introduced.
Dictionary class is obsolete and use of Map is preferred.
Typically I assume that a map is backed by a hash table; it connotes an unordered store.
Dictionaries connote an ordered store.
There is a tree-based dictionary called a Trie.
In Lisp, it might look like this:
(a (n (d t)) n d )
Which encapsulates the words:
a
and
ant
an
ad
The traversal from the top to the leaf yields a word.
Not really the same thing. Maps are a subset of dictionary. Dictionary is defined here as having the insert, delete, and find functions. Map as used by Java (according to this) is a dictionary with the requirement that keys mapping to values are strictly mapped as a one-to-one function. A dictionary might have more than one key map to one value, or one key map to several values (like chaining in a hasthtable), eg Twitter hashtag searches.
As a more "real world" example, looking up a word in a dictionary can give us a number of definitions for the same word, and when we find an entry that points us to another entry (see other word), a number of words for the same list of definitions. In the real world, maps are much broader, allowing us to have locations for names or names for coordinates, but also we can find a nearest neighbor or other attributes (populations, etc), so IMHO there could be argument for a greater expansion of the map type to possibly have graph based implementations, but it would be best to always assume just the key-value pair, especially since nearest neighbor and other attributes to the value could all just be data members of the value.
java maps, despite the one-to-one requirement, can implement something more like a generalized dictionary if the value is generalized as a collection itself, or if the values are merely references to collections stored elsewhere.
Remember that Java maintainers are not the maintainers of ADT definitions, and that Java decisions are specifically for Java.
Other terms for this concept that are fairly common: associative array and hash.
Yes, they are the same, you may add "Associative Array" to the mix.
using Hashtable or a Hash ofter refers to the implementation.
These are two different terms for the same concept.
Hashtable and HashMap also refer to the same concept.
so on a purely theoretical level.
A Dictionary is a value that can be used to locate a Linked Value.
A Map is a Value that provides instructions on how to locate another values
all collections that allow non linear access (ie only get first or get last) are a Map, as even a simple Array has an index that maps to the correct value. So while a Dictionary is a Type of map, maps are a much broader range of possible function.
In Practice a its usually the mapping function that defines the name, so a HashMap is a mapped data structure that uses a hashing algorithm to link the key to the value, where as a Dictionary doesn't specify how the keys are linked to a value so could be stored via a linked list, tree or any other algorithm. from the usage end you usually don't care what the algorithm only that they work so you use a generic dictionary and only shift to one of the other structures only when you need to enfore the type of algorithm
The main difference is that a Map, requires that all entries(value & key pair) have a unique key. If collisions occur, i.e. when a new entry has the same key as an entry already in the collection, then collision handling is required.
Usually, we handle collisions using either Separate Chaining. Or Linear Probing.
A Dictionary allows for multiple entries to be linked to the same key.
When a Map has implemented Separate Chaining, then it tends to resemble a Dictionary.
I'm in a data structures class right now and my understanding is the dict() data type that can also be initialized as just dictionary = {} or with keys and values, is basically the same as how the list/array data type is used to implement stacks and queues. So, dict() is the type and maps are a resulting data structure you can choose to implement with the dictionary data type in the same way you can use the list type and choose to implement a stack or queue data structure with it.

Resources