Give me a practical use-case of Multi-set - collections

I would like to know a few practical use-cases (if they are not related/tied to any programming language it will be better).I can associate Sets, Lists and Maps to practical use cases.
For example if you wanted a glossary of a book where terms that you want are listed alphabetically and a location/page number is the value, you would use the collection TreeMap(OrderedMap which is a Map)
Somehow, I can't associate MultiSets with any "practical" usecase. Does someone know of any uses?
http://en.wikipedia.org/wiki/Multiset does not tell me enough :)
PS: If you guys think this should be community-wiki'ed it is okay. The only reason I did not do it was "There is a clear objective way to answer this question".

Lots of applications. For example, imagine a shopping cart. That can contain more than one instance of an item - i.e. 2 cpu's, 3 graphics boards, etc. So it is a Multi-set. One simple implementation is to also keep track of the number of items of each - i.e. keep around the info 2 cpu's, 3 graphics boards, etc.
I'm sure you can think of lots of other applications.

A multiset is useful in many situations in which you'd otherwise have a Map. Here are three examples.
Suppose you have a class Foo with an accessor getType(), and you want to know, for a collection of Foo instances, how many have each type.
Similarly, a system could perform various actions, and you could use a Multiset to keep track of how many times each action occurred.
Finally, to determine whether two collections contain the same elements, ignoring order but paying attention to how often instances are repeated, simply call
HashMultiset.create(collection1).equals(HashMultiset.create(collection2))

In some fields of Math, a set is treated as a multiset for all purposes. For example, in Linear Algebra, a set of vectors is teated as a multiset when testing for linear dependancy.
Thus, implementations of these fields should benefit from the usage of multisets.
You may say linear algebra isn't practical, but that is a whole different debate...

A Shopping Cart is a MultiSet. You can put several instances of the same item in a Shopping Cart when you want to buy more than one.

Related

Graph database design: Should I add relationships, or just traverse

I have recently started exploring graph databases and Neo4J, and would like to work with my own data. At the moment I've hit some confusion. I've created an example image to illustrate my issue. In terms of efficiency, I'm wondering which option is better (and I want to get it right now in early days before I start handling larger amounts).
Option A: Using only the blue relationships, I can work out whether things are related to, or come under, the Ancient group. This process will be done many many times, however it is unlikely to be more than ~6 generations.
Option B: I implement the red relationships, so that it is much faster to work out if young structures belong to the Ancient group.
I'm trying not to use Labels in this scenario, as I'm trying to use labels for a specific purpose to simplify my life (linking structures across seperate networks), and I'm not sure if I should have a label to represent a node that already exists.
In summary, I'm wondering whether adding a whole new bunch of relationships, whilst taking more space, is worth it, or whether traversing to find all relatives is such a simple/inexpensive task that it isn't worth doing so. Or alternatively, both options are viable and this isn't a real issue at all. Thanks for reading.
I'd go with Option A. One of the strengths of Neo4j is that it traverses relationships very efficiently and quickly, and so, there is no need to materialise relationships (sometimes, relationships are materialised in complex and/or extremely large graphs, but this is not your case).
Not sure why you don't want to use labels? Labels serve to group nodes into sets of the same type, and are also index backed- this makes it much faster to find the starting point of your query (index lookup over full database scan).

Selecting optimal combinations

I have a problem that I am currently solving via brute force, but am looking for a more elegant solution. I have a system that runs various functions across multiple nodes. Each function is defined by a 'role'. Each 'role' can be defined to be allowed to one or more clients to hold it. Additionally, preference may be given to a particular client (or clients) over other clients.
The complexity comes in that it is also possible for 'roles' to be related to each other. For example, a client may only be able to hold 'RoleA' if they don't hold 'RoleB', or a client may only be able to hold 'RoleC' if they hold 'RoleD'. Additionally, roles can be related preferentially (i.e. it is preferred that a client holding 'RoleE' holds 'RoleF', but that this is not mandatory).
A client may advertise its willingness to hold any number of roles, but is not required to do so. i.e 'client1' may advertise for roles 'A', 'B', and 'C', while 'client2' may only advertise for roles 'A' and 'B'.
I have solved this problem in a brute force fashion, but obviously, as the number of related roles increases, solving it takes exponentially longer.
Currently, my algorithm is:
Work out all of the possible combinations for clients advertising a given role, and then asses that role in isolation to generate an list of legal combinations, ordered by preference.
Generate all possible combinations for the lists generated in the previous step, and iterate over these, deciding which is the 'most optimal' based on heuristics around mandatory, illegal, favoured, and unfavoured relationships of the group of roles. This is the part that explodes exponentially as the number of related roles increases.
I have tried some 'early out' approaches whereby a theoretical maximum possible 'score' is determined based on the role relationships, and that as soon as we encounter a combination that has a 'score' >= this that we just stop processing, but I'm wondering if there's a more mathematical solution. Any solution is presumably going to be an approximation of the optimal combination, but that is fine.
Ideally I need something that can run sub second.
Hopefully my explanation is not too vague and someone can point me in the right direction!
Thanks in advance.
Cam
Sounds like the Boolean satisfiability problem with some extra complication. BSP is an NP-complete problem, therefore there is no algorithm that can solve it in less than exponential time, however there are some algorithms (mentioned in the link) that can do it better than brute force.

A* algorithm and games

I am trying to implement minesweeper solver in lisp. I know this is not rare problem but i didn't find any article that can help me with that. At start i have a minefield as input with numbers on uncovered fields. Algorithm should be finished when all mines are found. So, in every step i have to check what fields i can put in my list of mined fields and to choose one field from my list of not mined fields and open it. Later i will check is my list of mined fields completed and if yes algorithm is done. I would appreciate any help. I don't ask for source code, but i need good ideas. I am not experienced with this kind of problems.
I HAVE to use A* algorithm. And i don't need to open all unopened fields...I need to find positions of all mined fields. And of course it has to be the SHORTEST path to do that. When i find positions of all mined fields algorithm is finished. So, once more, i need to find all mined fields with optimal number of opened fields. And of course i need a heuristic for my algorithm which will help to choose one of all safe unopened fields.
And that list of safe unopened fields needs to be determined after every opening. So i need to call main function, that function will check did i find all mined fields, if not, then all safe adjacent unopened fields needs to be added to list of paths. And a path with best heuristic will be chosen
I did implement a minesweeper solver in my first year at the University so I can give you some tips. (This is not using A* algorithm)
Important - Not all positions are solvable.
Backtracking of the whole mine field is a bit complicated for advanced difficulties (complicated=takes some time, consider all the possibilites to place 100 mines in a 30x30 field).
You can solve everything locally, in the same way a human solves the minesweeper. The potential of this is to give the users a hint how to continue instead of solving everything.
Example:
Have a separate mine field where you do the solving
Find all the unsolved cells that have a solved (number/ known mine) cell close enough (2 cell distance)
For every such cell, take a 5x5 neighborhood with the cell in the center, find every possibility (backtracking) and check if the possibilites have something in common (mines/non-mines), if yes, you can check the mines and uncover the non-mines.
Repeat while you can uncover something.
When you cannot uncover anything and the number of remaining mines is small enough, you can try backtracking over the whole field.
I hope I remember it correctly, I did some proofs why the 5x5 area is enough to check but it was almost 10 years ago.
You do not need the A* algorithm; its purpose is to find the shortest path in a graph (such as the shortest path between two places in a map, or the smallest amount of moves that will solve a puzzle). You will probably want to use a technique that is known as backtracking.
As long as there are unopened fields, pick an unopened field that is next to an open field, and tentatively flag it as a mine. Then, look at an unopened field that is adjacent to the previous one as well as to an opened field, and flag that one as a mine too, if this doesn't contradict the adjacent numbers - if it does, flag it as safe instead. Continue. Eventually, you will have looked at all unopened fields that surround the current area and have found one possible way of flagging the fields as safe or unsafe. However, this was based on several guesses, so now you need to go back to the last field where you made a guess and then make the opposite guess and then move forwards again to get another possible flag combination. Then, go even further back, revise your guesses, and so on. This can be implemented quite neatly with recursion. Eventually, you will have a collection of possible flag combinations. If you can find a field that is safe in all possible flag combinations, open that field. Otherwise, pick a field that is safe in as many flag combinations as possible.

Prolog association list

I am writing a simple program safety checker in Prolog and I need a data structure to hold variable valuation. Since I want to detect when I am visiting same state again, this structure must support some reasonable comparison semantics, so I can store visited states in set.
library(avl) has convenient getter/setter interface.
The problem is, AVL holding the same mapping can take multiple forms.
Thus two identical states would be considered distinct if their AVL representation differs.
A structure holding mapping in ordered lists would be free of this problem. However, I can't find anything like that in Sicstus docs. Is there any standard structure that does what I need, or do I have to implement it myself?
You have ordered sets but in AVL you can always convert AVLs to ordered lists of key-valued pairs and then compare them.

How to design a physics problem Database?

I have both problems and solutions to over twenty years of physics PhD qualifying exams that I would like to make more accessible, searchable, and useful.
The problems on the Quals are organized into several different categories. The first category is Undergraduate or Graduate problems. (The first day of the exam is Undergraduate, the second day is Graduate). Within those categories there are several subjects that are tested: Mechanics, Electricity & Magnetism, Statistical Mechanics, Quantum Mechanics, Mathematical Methods, and Miscellaneous. Other identifying features: Year, Season, and Problem number.
I'm specifically interested in designing a web-based database system that can store the problem and solution and all the identifying pieces of information in some way so that the following types of actions could be done.
Search and return all Electricity & Magnetism problems.
Search and return all graduate Statistical Mechanics problems.
Create a random qualifying exam — meaning a new 20 question test randomly picking 2 Undergrad mechanics problems, 2 Undergrade E&M problems, etc. from past qualifying exams (over some restricted date range).
Have the option to hide or display the solutions on results.
Any suggestions or comments on how best to do this project would be greatly appreciated!
I've written up some more details here if you're interested.
For your situation, it seems that it is more important part to implement the interface than the data storage. To store the data, you can use a database table or tags. Each record in the database (or tag) should have the following properties:
Year
Season
Undergradure or Graduate
Subject: CM, EM, QM, SM, Mathematical Methods, and Miscellaneous
Problem number (is it neccesary?)
Question
Answer
Search and return all Electricity & Magnetism problems.
Directly query the database and you will get an array, then display some or all questions.
Create a random qualifying exam — meaning a new 20 question test randomly picking 2 Undergrad mechanics problems, 2 Undergrade E&M problems, etc. from past qualifying exams (over some restricted date range).
To generate a random exam, you should first outline the number of questions for each category and the years it drawn from. For example, if you want 2 UG EM question. Query the database for all UG EM questions and then perform a random shuffling on the question array. Finally, select the first two of them and display this question to student. Continue with the other categories and you will get a complete random exam paper.
Have the option to hide or display the solutions on results.
It is your job to determine whether you want the students to see answer. It should be controlled by only one variable.
Are "Electricity & Magnetism" and "Statistical Mechanics" mutually exclusive categoriztions, along the same dimension? Are there multiple dimensions in categories you want to search for?
If the answer is yes to both, then I would suggest you look into multidimensional data modeling. As a physicist, you've got a leg up on most people when it comes to evaluating the number of dimensions to the problem. Analyzing reality in a multidimensional way is one of the things physicists do.
Sometimes obtaining and learning an MDDB tool is overkill. Once you've looked into multidimensional modeling, you may decide you like the modeling concept, but you still want to implement using relational databases that use the SQL interface.
In that case, the next thing to look into is star schema design. Star schema is quite different from normalization as a design principle, and it doesn't offer the same advantages and limitations. But it's worth knowing in the case where the problem is really a multidimensional one.

Resources