Hi I am new to graph database modeling and have some doubts about expressing an endorsment for a service provided by a Person. The use case is the following. PersonA gives Endorsement to a Service provided by PersonB.
The key point is that If I am recipient of the endorsment, I would like to know who has endorsed me. I have come up with several scenarios on how I could potentialy do that, but because of my lack of experience I have doubts on what would be the best aproach.
Scenario 1.
Endorsment is expressed direcly as a relationship and the service falls as a property under the endorsment So it will look like:
PersonA-------ENDORSE{service}--->PersonB
Scenario 2
I model an entity named Service. The problem is that when I do the relationship "ENDORSE" to service I would loose information on who am I endorsing. So I would have to keep a property in the relationship on who am I endorsing. Then the PersonB would AQUIRE endorsment for the SERVICE but he would not know who has actualy givern the endorsment. So..... it will look like this:
PERSONA----ENDORSE{personB}--->Service------ENDORSMENT{personA}--->PERSONB
Does this make sense ?
Scenario 3:
I normalize the second relationship "ENDORSMENT" and exclude the personA as a property , but than I need to query all Person to find out who have they endorsed.
How would you model this kind of relationship ?
Two important principles for validating a data model for a graph database:
if an entity or fact can be used more than once, then it should be stored
as the node
if the relationship of two nodes requires to store node
identifiers, then this relationship must be transformed into a node
So #Raj pointed the right way, in which case the model might look like this:
I recommend you read this:
https://neo4j.com/graph-databases-book/
http://patterns.dataincubator.org/book/
The second approach looks good, you don't have to add these properties on relationships.
It's possible to get person A who endorsed person B for service S.
The only issue with this is there will be multiple nodes for any service S. If that's not acceptable.
You can replace the Service node in the second approach with Endorse node E and connect this E to service node S.
So there will be four types of nodes.
EDIT:
Adding an image for clarification.
Rename REL1 and REL2 as you wish.
#Stdob suggested some good names for these relationships.
While using Graph Databases(my case Neo4j), we can represent the same information many ways. Making each entity a Node and connecting all entities through relationships or just adding the entities to attribute list of a Node.diff
Following are two different representations of the same data.
Overall, which mechanism is suitable in which conditions?
My use case involves traversing the Database from different nodes until 4 depths and examining the information through connected nodes or attributes (based on which approach it is).
One query of interest may be, "Who are the friends of John who went to Stanford?"
What is the difference in terms of Storage, computations
Normally,
properties are loaded lazily, and are more expensive to hold in cache, especially strings. Nodes and Relationships are most effective for traversal, especially since the relationships types are stored together with the relatoinship records and thus don't trigger property loads when used in traversals.
Also, a balanced graph (that is, not many dense nodes with over say 10K relationships) is most effective to traverse.
I would try to model most of the reoccurring proeprties as nodes connecting to the entities, thus using the graph itself to index on these values, instead of having to revert to filter on property values or index the property with an expensive index lookup.
The first one is much better since you're querying on entities such as Stanford- and that entity is related to many person nodes. My opinion that modeling as nodes is more intuitive and easier to query on. "Find all persons who went to Stanford" would not be very easy to do in your second model as you don't have a place to start traversing from.
I'd use attributes mainly to describe the node/entity use them to filter results from the query e.g. Who are friends of John who went to Stanford in the year 2010. In this case, the year attribute would just be used to trim the results. Depends on your use case- if year is really important and drives a lot of queries or is used to represent a timeline, you could even model the year as a node attached to Stanford.
I am writing a simple program safety checker in Prolog and I need a data structure to hold variable valuation. Since I want to detect when I am visiting same state again, this structure must support some reasonable comparison semantics, so I can store visited states in set.
library(avl) has convenient getter/setter interface.
The problem is, AVL holding the same mapping can take multiple forms.
Thus two identical states would be considered distinct if their AVL representation differs.
A structure holding mapping in ordered lists would be free of this problem. However, I can't find anything like that in Sicstus docs. Is there any standard structure that does what I need, or do I have to implement it myself?
You have ordered sets but in AVL you can always convert AVLs to ordered lists of key-valued pairs and then compare them.
I have two different databases that are not connected in any way. In fact, one is a public school database and one is a hud (housing) database. By law they are not allowed to share names and other specific identifying addresses. Birthdates and addresses are okay - along with zip codes and other more general ids. The uses need to be able to query the other database to get non-specific information so it would appear that they need to share the same unique id. I was considering such things as using birthdates and perhaps initials of name or perhaps last 4 digits of ssn along with the birthdate. The client was thinking of global positioning data but I'm concerned about apartments next to one another or moving of families. Any ideas?
First you need to determine what will be your measure of uniqueness. If there are two people in either database with more than one entry for your measure of uniqueness, you need to change your strategy. After that, put a constraint on both databases constraining that these properties(Birthday, SSN) are what make a Person record unique.
I know a map is a data structure that maps keys to values. Isn't a dictionary the same? What is the difference between a map and a dictionary1?
1. I am not asking for how they are defined in language X or Y (which seems to be what generally people are asking here on SO), I want to know what is their difference in theory.
Two terms for the same thing:
"Map" is used by Java, C++
"Dictionary" is used by .Net, Python
"Associative array" is used by PHP
"Map" is the correct mathematical term, but it is avoided because it has a separate meaning in functional programming.
Some languages use still other terms ("Object" in Javascript, "Hash" in Ruby, "Table" in Lua), but those all have separate meanings in programming too, so I'd avoid them.
See here for more info.
One is an older term for the other. Typically the term "dictionary" was used before the mathematical term "map" took hold. Also, dictionaries tend to have a key type of string, but that's not 100% true everywhere.
Summary of Computer Science terminology:
a dictionary is a data structure representing a set of elements, with insertion, deletion, and tests for membership; the elements may be, but are not necessarily, composed of distinct key and value parts
a map is an associative data structure able to store a set of keys, each associated with one (or sometimes more than one - e.g. C++ multimap) value, with the ability to access and erase existing entries given only the key.
Discussion
Answering this question is complicated by programmers having seen the terms given more specific meanings in particular languages or systems they've used, but the question asks for a language agnostic comparison "in theory", which I'm taking to mean in Computing Science terms.
The terminology explained
The Oxford University Dictionary of Computer Science lists:
dictionary any data structure representing a set of elements that can support the insertion and deletion of elements as well as test for membership
For example, we have a set of elements { A, B, C, D... } that we've been able to insert and could start deleting, and we're able to query "is C present?".
The Computing Science notion of map though is based on the mathematical linguistic term mapping, which the Oxford Dictionary defines as:
mapping An operation that associates each element of a given set (the domain) with one or more elements of a second set (the range).
As such, a map data structure provides a way to go from elements of a given set - known as "keys" in the map, to one or more elements in the second set - known as the associated "value(s)".
The "...or more elements in the second set" aspect can be supported by an implementation is two distinct way:
Many map implementations enforce uniqueness of the keys and only allow each key to be associated with one value, but that value might be able to be a data structure itself containing many values of a simpler data type, e.g. { {1,{"one", "ichi"}, {2, {"two", "ni"}} } illustrates values consisting of pairs/sets of strings.
Other map implementations allow duplicate keys each mapping to the same or different values - which functionally satisfies the "associates...each [key] element...with...more [than one] [value] elements" case. For example, { {1, "one"}, {1, "ichi"}, {2, "two"}, {2, "ni"} }.
Dictionary and map contrasted
So, using the strict Comp Sci terminology above, a dictionary is only a map if the interface happens to support additional operations not required of every dictionary:
the ability to store elements with distinct key and value components
the ability to retrieve and erase the value(s) given only the key
A trivial twist:
a map interface might not directly support a test of whether a {key,value} pair is in the container, which is pedantically a requirement of a dictionary where the elements happen to be {key,value} pairs; a map might not even have a function to test for a key, but at worst you can see if an attempted value-retrieval-by-key succeeds or fails, then if you care you can check if you retrieved an expected value.
Communicate unambiguously to your audience
⚠ Despite all the above, if you use dictionary in the strict Computing Science meaning explained above, don't expect your audience to follow you initially, or be impressed when you share and defend the terminology. The other answers to this question (and their upvotes) show how likely it is that "dictionary" will be synonymous with "map" in the experience of most programmers. Try to pick terminology that will be more widely and unambiguously understood: e.g.
associative container: any container storing key/value pairs with value-retrieval and erasure by key
hash map: a hash table implementation of an associative container
hash set enforcing unique keys: a hash table implementation of a dictionary storing element/values without treating them as containing distinct key/value components, wherein duplicates of the elements can not be inserted
balance binary tree map supporting duplicate keys: ...
Crossreferencing Comp Sci terminology with specific implementations
C++ Standard Library
maps: map, multimap, unordered_map, unordered_multimap
other dictionaries: set, multiset, unordered_set, unordered_multiset
note: with iterators or std::find you can erase an element and test for membership in array, vector, list, deque etc, but the container interfaces don't directly support that because finding an element is spectacularly inefficient at O(N), in some cases insert/erase is inefficient, and supporting those operations undermines the deliberately limited API the container implies - e.g. deques should only support erase/pop at the front and back and not in terms of some key. Having to do more work in code to orchestrate the search gently encourages the programmer to switch to a container data structure with more efficient searching.
...may add other languages later / feel free to edit in...
My 2 cents.
Dictionary is an abstract class in Java whereas Map is an interface. Since, Java does not support multiple inheritances, if a class extends Dictionary, it cannot extend any other class.
Therefore, the Map interface was introduced.
Dictionary class is obsolete and use of Map is preferred.
Typically I assume that a map is backed by a hash table; it connotes an unordered store.
Dictionaries connote an ordered store.
There is a tree-based dictionary called a Trie.
In Lisp, it might look like this:
(a (n (d t)) n d )
Which encapsulates the words:
a
and
ant
an
ad
The traversal from the top to the leaf yields a word.
Not really the same thing. Maps are a subset of dictionary. Dictionary is defined here as having the insert, delete, and find functions. Map as used by Java (according to this) is a dictionary with the requirement that keys mapping to values are strictly mapped as a one-to-one function. A dictionary might have more than one key map to one value, or one key map to several values (like chaining in a hasthtable), eg Twitter hashtag searches.
As a more "real world" example, looking up a word in a dictionary can give us a number of definitions for the same word, and when we find an entry that points us to another entry (see other word), a number of words for the same list of definitions. In the real world, maps are much broader, allowing us to have locations for names or names for coordinates, but also we can find a nearest neighbor or other attributes (populations, etc), so IMHO there could be argument for a greater expansion of the map type to possibly have graph based implementations, but it would be best to always assume just the key-value pair, especially since nearest neighbor and other attributes to the value could all just be data members of the value.
java maps, despite the one-to-one requirement, can implement something more like a generalized dictionary if the value is generalized as a collection itself, or if the values are merely references to collections stored elsewhere.
Remember that Java maintainers are not the maintainers of ADT definitions, and that Java decisions are specifically for Java.
Other terms for this concept that are fairly common: associative array and hash.
Yes, they are the same, you may add "Associative Array" to the mix.
using Hashtable or a Hash ofter refers to the implementation.
These are two different terms for the same concept.
Hashtable and HashMap also refer to the same concept.
so on a purely theoretical level.
A Dictionary is a value that can be used to locate a Linked Value.
A Map is a Value that provides instructions on how to locate another values
all collections that allow non linear access (ie only get first or get last) are a Map, as even a simple Array has an index that maps to the correct value. So while a Dictionary is a Type of map, maps are a much broader range of possible function.
In Practice a its usually the mapping function that defines the name, so a HashMap is a mapped data structure that uses a hashing algorithm to link the key to the value, where as a Dictionary doesn't specify how the keys are linked to a value so could be stored via a linked list, tree or any other algorithm. from the usage end you usually don't care what the algorithm only that they work so you use a generic dictionary and only shift to one of the other structures only when you need to enfore the type of algorithm
The main difference is that a Map, requires that all entries(value & key pair) have a unique key. If collisions occur, i.e. when a new entry has the same key as an entry already in the collection, then collision handling is required.
Usually, we handle collisions using either Separate Chaining. Or Linear Probing.
A Dictionary allows for multiple entries to be linked to the same key.
When a Map has implemented Separate Chaining, then it tends to resemble a Dictionary.
I'm in a data structures class right now and my understanding is the dict() data type that can also be initialized as just dictionary = {} or with keys and values, is basically the same as how the list/array data type is used to implement stacks and queues. So, dict() is the type and maps are a resulting data structure you can choose to implement with the dictionary data type in the same way you can use the list type and choose to implement a stack or queue data structure with it.