This is an interview question: How to serialize a graph ? I saw this answer but I am not sure if this is enough.
It looks like a very confusing "open question" and the candidates are probably expected to ask more questions about the requirements: what the nodes and edges are, how they are serialized themselves, is this graph weighted, directed, etc., how many nodes/edges are in the graph.What about the infrastructure ? Is it a plain file system or we should/can use a database ?
So, how would you answer this question ?
I think the answer you provided is quite reasonable. IMO, basically you need to know the application background, I will ask at least:
is it directed or not?
what are the properties associated with the vertex, edge and graph itself?
is the graph sparse (If so then we'd better not use adjacency matrix) ?
The simplest way will be storing it as an edge list.
However, in different application there are some classical ways to do it.
For example if you are doing circuit simulation then the graph is sparse and
the resulting graph/matrix can be stored as column-compressed form. If you are solving a (min-cost) max-flow problem then there are already a DIMACS format, such that public solvers can read it and write it. Structured way is also a good choice if you want human readable, XML can provide self-validation (there is already a GraphML as the standard). By the way, the dot format is quite self-contained.
Meh. Whatever you store it in, it's basically:
Output each vertex in the graph. If you don't have all the vertices first, it's a PITA to rebuild the graph when you're reading it back in.
Now you can store edges between vertices. Hopefully your vertices have some form of ID, to uniquely identify them. The version of this I've seen is "store a (graph|tree) in a database". So, read in the nodes, store in a hashtable or similar for O(1) amortized lookup. Then, foreach edge, lookup ID-source and ID-dest, and link.
Voila, you've deserialized it. If it's not a DB, the same idea generally holds - serialize nodes first, then edges.
Related
I would like to test my recently created algorithm on large (50+ node) graphs. Preferrably, they would specifically be challenging graphs, and known tours would exist (for at least most of them).
Problem sets for this problem do not seem as easy to find as with the TSP. I am aware of the Flinder's challenge set available at http://www.flinders.edu.au/science_engineering/csem/research/programs/flinders-hamiltonian-cycle-project/fhcpcs.cfm
However, they seem to be directed. I can probably alter my algorithm to work for directed, but it will take time and likely induce bugs. I'd prefer to know if it can work for undirected first.
Does anyone know where problem sets are available? Thank you.
quick edit:
Now I am unsure if the flinder's set is directed or not.... It doesn't say. Examples make it seem like maybe it actually is undirected.
Check this video:
https://www.youtube.com/watch?v=G1m7goLCJDY
Also check the in depth sequel to the video.
You can determine yourself how many nodes you want to add to the graph.
It does require you to construct the data yourself, which should be deable.
One note: the problem is about a path, not a cycle, but you can overcome this by connecting the start and end node.
When building a graph, it is usually necessary to specify the 'type' of vertices. Conceptually I see this could be done by applying a vertex label or property to each vertex (ie Bob, Label: Man), or alternatively by linking a vertex to another 'type' vertex (ie. Bob --IS A--> Man).
To find a list of all vertices of type 'Man' I can write gremlin queries that work for both of these approaches. But what is best practice?
Best practice: keep your data model simple and make sure it is compatible with efficient indexing by the underlying graph database. There is no one size fits all solution at the TinkerPop level.
It really depends on your data model as well as the indexing capabilities of the underlying database, not to mention the way the data is actually serialized on disk. Ultimately, it all boils down to the way you expect to query your graph and the kind of performance you wish to have.
This being said, people typically use vertex labels, sometimes used in conjunction with a type property of some kind. Graph implementers should be able to provide efficient indexes for answering such query. It should also give a simpler graph model, which is an important thing to consider.
Depending on the size of your graph, you could get performance issues when modeling types with vertices since a man type vertex could quickly become a supernode.
I am asking about Algorithms that would be useful in Querying the Semantic web DB to get all the related RDFs to an original Object.
i.e If the original Object is the movie "inception", I want an algorithm to build queries to get the RDFs of the cast of the movie, the studio, the country ....etc so that I can build a relationship graph.
The most close example is the answer to this question , Especially this class , I wan similar algorithms or maybe titles to search in order to produce such an algorithm, I am thinking maybe some modifications on graph traversing algorithms can work, but I'm not sure.
NOTE: My project is in ASP.NET. So, it would help to use Exisiting .NET libraries.
You should be able to do a simple breadth-first-search to get all the objects that are a certain distance away from a given node.
You'll need to know something about the schema because some neighboring nodes are more meaningful than others. For example, in Freebase, we have intermediate nodes that link a film to an actor and a role. You need to know to go 2-ply deep to get at the actor and the role because just saying that the film is related to the intermediate nodes is not very interesting.
Did you take a look at "property paths"?
Property Paths give a more succinct way to write parts of basic graph
patterns and also extend matching of triple pattern to arbitrary
length paths. Property paths do not invalidate or change any existing
SPARQL query.
Triple stores and SPARQL engines such as OWLIM and AllegroGraph support them.
I'm hoping you can help me out with some technical questions on graphs/trees.
I'm trying to display the creation of objects in systems.
It's really a tree structure.
It has some interesting requirements.
a)
One node can have many children. Say 20. Maybe more.
ie. one library can be used by many objects.
b)
A child node can have many parents. Say up to 20.
ie. many libraries are used by one procedure or object
c)
A particular node can appear in more than one place.
ie. a generic print, or logging function is called in many procedures
Note: This is just an -example- in tech terms I expect you will understand.
It is NOT the issue I need to model. No need to discuss it.
As I've thought about it, I realized that it's not a simple binary tree, or a linked list.
1)
What kind of data structure could I save all the data in?
2)
How could I produce a graph of this in java?
3)
What is a free open source graphing software that could graph such a tree?
Such as Neo4j
Perhaps in formats:
- as a tree, with a root, trunk, branches, and leaves?
- Like the graphs you see now, depicting social networks, with the root node in the center?
4)
Any good websites, or tutorials on this subject?
Thanks a lot!
Check out prefuse. It's old but it works. You'll have to invest a bit of time to learn how to use it though. Once you get there, it's just a matter of creating a prefuse.data.Graph object and fill in your nodes and their neighbors and then creating the visualization.
If you're open to other solution check out d3.js - draw graph using javascript on SVG element in your browser.
If this is really about objects, then maybe UML can help. It's designed to generate graphs of object relationships. There are tons of free UML tools out there. I'd download one and see if you can shoehorn your application into it.
JGraphT can represent your graph structure and can use JGraph for visualisation.
For an example visualization, look at this.
What's the best way to represent graph data structures in LabVIEW?
I'm doing some basic algorithm review over the holiday, and I'd prefer to not implement all of the storage and traversals myself, if possible.
(I'm aware that there was a thread a few years ago on LAVA, is that my best bet?)
I've never had a need to do this myself, so I never really looked into it, but there are some people who did do some work as far I know.
Brian K. has posted something over here, although it's been a long time since I looked at it:
https://decibel.ni.com/content/docs/DOC-12668
If that doesn't help, I would suggest you read this and then try sending a PM to Daklu there, as he's the most likely candidate to have something.
https://decibel.ni.com/content/thread/8179?tstart=0
If not, I would suggest posting a question on LAVA, as you're more likely to find the relevant people there.
Well you don't have that many options for graphs , from a simple point of view. It really depends on the types of algorithms you are doing, in order to choose the most convenient representation.
Adjacency matrix is simple, but can be slow for some tasks, and can be wasteful if the graph is not dense.
You can keep a couple of lists and hash maps of your edges and vertices. With each edge or vertex created assigned a unique index into the list,it's pretty simple to keep things under control. Each vertex could then be associated with a list of its neighbors. Depending on your needs you could divide that neighbors list into in and out edges. Also depending on your look up needs, you could choose to index edges by their in or out edge or both, or simple by a unique index number.
I had a glance at the LabView quick reference, and while it was not obvious from there how you would do that, as long as they have arrays of some sort, you can implement a graph. I'm sure you'll be fine.