How to create a bipartite graph in GraphX - graph

I am able to build a graph using a vertexRDD and an edgeRDD via the GraphX API, no problem there. i.e.:
val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)
However, I don't know where to start if I want to use two separate vertexRDD's instead of just one (a bipartite graph). Fore example, a graph containing shopper and product vertices.
My question is broad so I'm not expecting a detailed example, but rather a hint or nudge in the right direction. Any suggestions would be much appreciated.

For example to model users and products as a bipartite graph we might do the following:
trait VertexProperty
case class UserProperty(val name: String) extends VertexProperty
case class ProductProperty(val name: String,
val price: Double) extends VertexProperty
val users: RDD[(VertexId, VertexProperty)] = sc.parallelize(Seq(
(1L, UserProperty("user1")), (2L, UserProperty("user2"))))
val products: RDD[(VertexId, VertexProperty)] = sc.parallelize(Seq(
(1001L, ProductProperty("foo", 1.00)), (1002L, ProductProperty("bar", 3.99))))
val vertices = VertexRDD(users ++ products)
// The graph might then have the type:
val graph: Graph[VertexProperty, String] = null

Related

Canonical way to represent idea of sum type of records that all extend a "base" record

I'm new to PureScript. I was searching for sealed classes in Purescript to get an idea of how one would implement this, but I don't think I have the necessary PS jargon yet.
What is the canonical way in PureScript to have a bunch of records that extend a "base" record, but then have one sum type representing a "sealed" collection of those.
Something like in Kotlin,
sealed class Glazing(val name: String, val area: Int) {
class Window(val name: String, val area: Int, val frameMaterial: String): BaseGlazing(name, area)
class Door(val name: String, val area: Int, val isPreHung: Boolean): BaseGlazing(name, area)
}
in TypeScript, you'd probably do something like
interface BaseGlazing { ... }
interface Door extends BaseGlazing { ... }
interface Window extends BaseGlazing { ... }
type Glazing = Door | Window
and then you'd either take `A extends BaseGlazing` or `Glazing` (and use type guards) to do either of those two above functions.
Essentially I want a base class (that is technically abstract), things that extend it, and then a sum type/discriminated union of the extensions so that way I can both write, say, changeName:: Glazing -> Glazing (premised on the base class having a name prop) but also do something like calculateTotalLightPenetration :: Array Glazing -> Number (premised on the discriminated union being one of Door or Window since light penetration will be a different formula for doors vs windows)
The idea of "inheritance" (aka "is-a" relationship) is technically possible to model in PureScript, but it's hard and awkward. And there is a good reason for it: inheritance is almost never (and I am tempted to say "never, period") the most convenient, efficient, or reliable way of modeling the domain. Even OOP apologists tend to recommend aggregation over inheritance these days.
One useful thing to observe is that you don't actually need inheritance. What you need is to solve some specific problem in your domain, and inheritance is just a solution that you naturally reach for, which is probably informed by your past experience.
And this leads us to an insight: the particular way to model whatever it is you're modeling would depend on what the actual problem is. Chances are, PureScript has a different mechanism for modeling that.
But if I base my thinking on the specifics you gave in your question (i.e. the changeName and calculateTotalLightPenetration functions), I would model it via aggregation: the "glazing" would be the surrounding type, and it would have, as one of its parts, the specific kind of glazing. This would look something like this:
type Glazing = { name :: String, area :: Int, kind :: GlazingKind }
data GlazingKind = Window { frameMaterial :: String } | Door { isPreHung :: Boolean }
changeName :: Glazing -> Glazing
changeName g = g { name = "new name" }
calculateTotalLightPenetration :: Array Glazing -> Number
calculateTotalLightPenetration gs = sum $ individualPenetration <$> gs
where
individualPenetration g = case g.kind of
Door _ -> 0.3
Window _ -> 0.5

What's a good pattern to manage impossible states in Elm?

Maybe you can help. I'm an Elm beginner and I'm struggling with a rather mundane problem. I'm quite excited with Elm and I've been rather successful with smaller things, so now I tried something more complex but I just can't seem to get my head around it.
I'm trying to build something in Elm that uses a graph-like underlying data structure. I create the graph with a fluent/factory pattern like this:
sample : Result String MyThing
sample =
MyThing.empty
|> addNode 1 "bobble"
|> addNode 2 "why not"
|> addEdge 1 2 "some data here too"
When this code returns Ok MyThing, then the whole graph has been set up in a consistent manner, guaranteed, i.e. all nodes and edges have the required data and the edges for all nodes actually exist.
The actual code has more complex data associated with the nodes and edges but that doesn't matter for the question. Internally, the nodes and edges are stored in the Dict Int element.
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
Now, in the users of the module, I want to access the various elements of the graph. But whenever I access one of the nodes or edges with Dict.get, I get a Maybe. That's rather inconvenient because by the virtue of my constructor code I know the indexes exist etc. I don't want to clutter upstream code with Maybe and Result when I know the indexes in an edge exist. To give an example:
getNodeTexts : Edge -> MyThing -> Maybe (String, String)
getNodeTexts edge thing =
case Dict.get edge.from thing.nodes of
Nothing ->
--Yeah, actually this can never happen...
Nothing
Just fromNode -> case Dict.get edge.to thing.nodes of
Nothing ->
--Again, this can never actually happen because the builder code prevents it.
Nothing
Just toNode ->
Just ( fromNode.label, toNode.label )
That's just a lot of boilerplate code to handle something I specifically prevented in the factory code. But what's even worse: Now the consumer needs extra boilerplate code to handle the Maybe--potentially not knowing that the Maybe will actually never be Nothing. The API is sort of lying to the consumer. Isn't that something Elm tries to avoid? Compare to the hypothetical but incorrect:
getNodeTexts : Edge -> MyThing -> (String, String)
getNodeTexts edge thing =
( Dict.get edge.from thing.nodes |> .label
, Dict.get edge.to thing.nodes |> .label
)
An alternative would be not to use Int IDs but use the actual data instead--but then updating things gets very tedious as connectors can have many edges. Managing state without the decoupling through Ints just doesn't seem like a good idea.
I feel there must be a solution to this dilemma using opaque ID types but I just don't see it. I would be very grateful for any pointers.
Note: I've also tried to use both drathier and elm-community elm-graph libraries but they don't address the specific question. They rely on Dict underneath as well, so I end up with the same Maybes.
There is no easy answer to your question. I can offer one comment and a coding suggestion.
You use the magic words "impossible state" but as OOBalance has pointed out, you can create an impossible state in your modelling. The normal meaning of "impossible state" in Elm is precisely in relation to modelling e.g. when you use two Bools to represent 3 possible states. In Elm you can use a custom type for this and not leave one combination of bools in your code.
As for your code, you can reduce its length (and perhaps complexity) with
getNodeTexts : Edge -> MyThing -> Maybe ( String, String )
getNodeTexts edge thing =
Maybe.map2 (\ n1 n2 -> ( n1.label, n2.label ))
(Dict.get edge.from thing.nodes)
(Dict.get edge.to thing.nodes)
From your description, it looks to me like those states actually aren't impossible.
Let's start with your definition of MyThing:
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
This is a type alias, not a type – meaning the compiler will accept MyThing in place of {nodes : Dict Int String, edges : Dict Int {from : Int, to : Int, label : String}} and vice-versa.
So rather than construct a MyThing value safely using your factory functions, I can write:
import Dict
myThing = { nodes = Dict.empty, edges = Dict.fromList [(0, {from = 0, to = 1, label = "Edge 0"})] }
… and then pass myThing to any of your functions expecting MyThing, even though the nodes connected by Edge 0 aren't contained in myThing.nodes.
You can fix this by changing MyThing to be a custom type:
type MyThing
= MyThing { nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
… and exposing it using exposing (MyThing) rather than exposing (MyThing(..)). That way, no constructor for MyThing is exposed, and code outside of your module must use the factory functions to obtain a value.
The same applies to Edge, wich I'm assuming is defined as:
type alias Edge =
{ from : Int, to : Int, label : String }
Unless it is changed to a custom type, it is trivial to construct arbitrary Edge values:
type Edge
= Edge { from : Int, to : Int, label : String }
Then however, you will need to expose some functions to obtain Edge values to pass to functions like getNodeTexts. Let's assume I have obtained a MyThing and one of its edges:
myThing : MyThing
-- created using factory functions
edge : Edge
-- an edge of myThing
Now I create another MyThing value, and pass it to getNodeTexts along with edge:
myOtherThing : MyThing
-- a different value of type MyThing
nodeTexts = getNodeTexts edge myOtherThing
This should return Maybe.Nothing or Result.Err String, but certainly not (String, String) – the edge does not belong to myOtherThing, so there is no guarantee its nodes are contained in it.

OCaml directed graphs vertex module

I have seen some graphs vertex signatures and even come up with my own:
module type VERTEX = sig
type t
type label
val equal : t -> t -> bool
val create : label -> t
val label : t -> label
end
But I have completely no idea how to implement it as a module. What types should t and label be? How can I create a t based on a label? And how do I get the label from a t?
I'm an author of Graphlib, so I can't pass by as this question hits me directly into my heart. Honestly, I was asked this question millions of times offline and never was able to provide a good answer.
The real problem is that the graph interfaces from the OCamlGraph library are all messed up. We started Graphlib as an attempt to fix them. However, OCamlGraph is a valuable repository of Graph algorithms, thus we have constrained ourselves to be compatible with the OCamlGraph interface. The main problem for us was and still is this Vertex interface that basically establishes a bijection between the set of labels and the set of nodes. People usually stumble on this, as this doesn't make sense - why do we need two different types, one for the label and another for the vertex, if they are the same?
Indeed, the simplest implementation of the VERTEX interface is the following module
module Int : VERTEX with type label = int = struct
type t = int
type label = int
let create x = x
let label x = x
end
In that case, we indeed have a trivial bijection (via the identity endofunctor) between the set of labels and the set of vertices.
However, the deeper look, shows us that a signature
val create : label -> t
val label : t -> label
Is not really a bijection, as the bijection is a one-to-one mapping. It is not really required or enforced by the type system. For example, the create function could be a surjection of label onto t, where label is some distinctive element of a family of vertices. Correspondingly, the label function, could be a forgetting functor that returns the distinctive label and forgetting everything else.
Given this approach, we can have another implementation:
module Labeled = struct
type label = int
type t = {
label : label;
data : "";
}
let create label = {label; data = ""}
let label n = n.label
let data n = n.data
let with_data n data = {n with data}
let compare x y = compare x.label y.label
end
In that implementation, we use the label as an identity of a node, and arbitrary attribute can be attached to a node. In this interpretation, the create function partitions all sets of nodes into a set of equivalence classes, where all members of a class, share the same identity, i.e., they represent the same real-world entity in different points of time or space. For example,
type color = Red | Yellow | Green
module TrafficLight = struct
type label = int
type t = {
id : label;
color : color
}
let create id = {id; color=Red}
let label t = t.id
let compare x y = compare x.id y.id
let switch t color = {t with color}
let color t = t.color
end
In this model, we represent a traffic light with its id number. The color attribute doesn't affect an identity of a traffic light (if a traffic light switches to another color it is still the same traffic light, although in a functional programming language it is represented with two different objects).
The main problem with the above representation is that in all graph textbooks the label is used in the opposite meaning - as an opaque attribute. In a textbook, they will refer to the color of a traffic light as a label. And the node itself will be represented as an int. That's why I'm saying that OCamlGraph interfaces are messed up (and consequently the Graphlib interfaces). So, if you don't want to fall in a contradiction with textbooks, then you should use unlabeled graphs (with int probably is the best representation of a node). And if you need to attach attributes to your nodes, you can use external finite maps, i.e., arrays, maps, associative lists, or any other dictionaries. Otherwise, you need to keep in mind that your label is not a label, but vice verse - the node.
With all this said, let's specify a better interface for a graph vertex:
module type VERTEX = sig
type id
type label
type t
val create : id -> t
val id : t -> id
val label : t -> label
val with_label : t -> label -> label
end
The proposed interface is compatible with your interface (and thus with the OCamlGraph), as it is isomorphic modulo renaming (i.e., we renamed label to id). It also allows us to create efficient unlabeled nodes, where id = t, as well as attach arbitrary information to a node without relying on external mappings.
Implementing a module based on a signature is like a mini puzzle. Here's how I would analyze it:
The first remark I have when reading that signature, is that there is no way in that signature to build values of type label. So, our implementation will need to be a bit larger, maybe by specifying type label = string.
Now, we have:
val create : label -> t
val label : t -> label
Which is a bijection (the types are "equivalent"). The simplest way to implement that is by defining type t = label, so that it's really only one type, but from the exterior of the module you don't know that.
The rest is
type t
val equal: t -> t -> bool
We said that label = string, and t = label. So t = string, and equal is the string equality.
Boom! here we are:
module String_vertex : VERTEX with type label = string = struct
type label = string
type t = string
let equal = String.equal
let create x = x
let label x = x
end
The VERTEX with type label = string part is just if you want to define it in the same file. Otherwise, you can do something like:
(* string_vertex.ml *)
type label = string
type t = string
let equal = String.equal
let create x = x
let label x = x
and any functor F that takes a VERTEX can be called with F(String_vertex).
It would be best practice to create string_vertex.mli with contents include VERTEX with type label = string, though.

OrientDB Create Edge between a Vertex and ODocument

I have a database with a whole bunch of ODocument records. They have their own Class hierarchy and it does not extend V.
I am in the process of adding in new collections and to support some of the features - we would like to use the graph db capabilities.
So I created a new Vertex per
Vertex company = graph.addVertex(null);
I find my existing ODoc and convert that to a vertex as
Vertex person = null;
for (Vertex v : graph.getVertices("Person.name", "Jay")) {
person = v;
}
and try to create an Edge
Edge sessionInIncident = graph.addEdge(null, company, person, "employs");
The edge creation leads to the following
Class 'Person' is not an instance of V
java.lang.IllegalArgumentException
at com.tinkerpop.blueprints.impls.orient.OrientElement.checkForClassInSchema(OrientElement.java:635)
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:905)
at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.addEdge(OrientBaseGraph.java:685)
In order to be a Vertex, class Person must extend the V class. Try this command:
alter class Person superclass V

Can I insert into a map by key in F#?

I'm messing around a bit with F# and I'm not quite sure if I'm doing this correctly. In C# this could be done with an IDictionary or something similar.
type School() =
member val Roster = Map.empty with get, set
member this.add(grade: int, studentName: string) =
match this.Roster.ContainsKey(grade) with
| true -> // Can I do something like this.Roster.[grade].Insert([studentName])?
| false -> this.Roster <- this.Roster.Add(grade, [studentName])
Is there a way to insert into the map if it contains a specified key or am I just using the wrong collection in this case?
The F# Map type is a mapping from keys to values just like ordinary .NET Dictionary, except that it is immutable.
If I understand your aim correctly, you're trying to keep a list of students for each grade. The type in that case is a map from integers to lists of names, i.e. Map<int, string list>.
The Add operation on the map actually either adds or replaces an element, so I think that's the operation you want in the false case. In the true case, you need to get the current list, append the new student and then replace the existing record. One way to do this is to write something like:
type School() =
member val Roster = Map.empty with get, set
member this.Add(grade: int, studentName: string) =
// Try to get the current list of students for a given 'grade'
let studentsOpt = this.Roster.TryFind(grade)
// If the result was 'None', then use empty list as the default
let students = defaultArg studentsOpt []
// Create a new list with the new student at the front
let newStudents = studentName::students
// Create & save map with new/replaced mapping for 'grade'
this.Roster <- this.Roster.Add(grade, newStudents)
This is not thread-safe (because calling Add concurrently might not update the map properly). However, you can access school.Roster at any time, iterate over it (or share references to it) safely, because it is an immutable structure. However, if you do not care about that, then using standard Dictionary would be perfectly fine too - depends on your actual use case.

Resources