Discrete Math - Vertex Coloring - graph

I have a homework which I was given about a week ago. The thing is, I don't understand what my teacher taught but he gave us a homework...
A = {a,b,s}, B = {b,h,t}, C = {a,t,s}, D = {h,t,s}, E = {a,b}, F = {b,t,s}
How to create a minimal vertex coloring, which A,B,C,D,E and F are the vertexes?
I do know how to color a vertex but I don't know how to create the graphs from the given sets. Any helps? I tried looking on the internet but I don't come across a question like this.

If the graph is to be interpreted in such a way that the vertices A, B, C, D, E, F are meant to be connected if and only if they intersect, an optimal coloring has 5 colors.
The resulting graph is almost the complete graph on 6 vertices - {E,F} and {E,D} are the only edges which are missing. That being said, it contains the complete graph on 5 vertices via the subgraph induced by {A,B,C,D,F}. Consequently, any vertex coloring cannot use less than 5 colors. In total, the coloring
F : 1
A : 2
B : 3
C : 4
D : 5
E : 1
is a 5-coloring of the graph which is optimal.

Related

Why do we have at least three definitions of a graph in math?

Definition 1 - 2 sets and function
Definitioin 2 - 1 set and 1 family
Definition 3 - 1 relation
Why do we need such a diversity? Are some of these definitions old-fashioned or all of them have their pros and cons?
Undirected graphs and directed graphs
The third definition differs from the first two because it is about directed graphs, while the first two define undirected graph. We care about directed graphs and undirected graphs because they are adapted to different situations and to solving different problems.
You can think of directed graphs and undirected graphs as two different objects.
In general, undirected graphs are somewhat easier to reason about, and most often if someone mentions a "graph" without precision, they mean an undirected graph.
Named edges and incidence function
The first two definitions are pretty much equivalent.
The first definition, with (V, E, ѱ) gives "names" to vertices (elements of V) and names to edges (elements of E), and uses an "incidence function" ѱ to tell you which edge in E corresponds to which pair of vertices of V.
The second definition uses only (V', E') and no ѱ. I am calling them V' and E' instead of V and E to make a distinction from the first definition. Here the vertices have "names", they are the elements of V'; but the edges don't really have individual names, and E' is defined as a subset of the set of undirected pairs of V. Therefore an edge is an undirected pair of elements of V'.
Here is an example of a graph:
By the first definition:
V = {a, b, c, d};
E = {1, 2, 3};
ѱ : E -> {unordered pairs of V}
1 -> ab
2 -> ac
3 -> cd.
By the second definition:
V' = {a, b, c, d}
E' = {ab, ac, cd}.
As you can see, V' = V, and E' is the image of E by ѱ.
If you don't care about "names" for the edges, the second definition is somewhat shorter. But which one you use really doesn't matter; the theorems you might prove with one definition will be equivalent to the theorems you can prove for the other definition. The difference between the two definition is just a set theory nitpick of what "edge" means: is it a pair of elements of V, or an element of another set which is mapped to a pair of elements of V by a function? Note that the function ѱ is a bijection between the two sets E and E', so really E and E' are two different names for the same set.
Algorithms, programming languages, and representations of graphs
If you ever have to code an algorithm using a graph in your favourite programming language, you will have to decide how to represent the graph using variables and arrays and all the data structures you are used to.
For the vertices, most often, people use V = {0, 1, 2, ..., n-1} where n is the number of vertices. This is convenient because it means you can use the vertices as indices for an array.
For the edges, sometimes we encode E using an vertex-vertex incidence matrix of size n*n with a 1 in cell i,j to indicate an edge between vertices i and j and a 0 in cell i,j to indicate no edge. Here is the incidence matrix for the graph above (I replaced a,b,c,d with 0,1,2,3 as the names for the vertices):
0 1 2 3
0 0 1 1 0
1 1 0 0 0
2 1 0 0 1
3 0 0 1 0
Sometimes we encode E using an array of lists: an array of size n, where cell i contains the list of indices of vertices which are neighbours of vertex i. Here is the array of lists for the same graph:
0: 1,2
1: 0
2: 0,3
3: 2
Those two representations are closer to the second definition, since the edges don't have names; we just care about whether each pair of vertices is an edge or not.
Recently I had to write a C++ program where it was very important for me to number the edges, because I wanted to be able to use them as indices of a matrix. Thus I had V = {0, 1, 2, ..., n-1}; E = {0, 1, 2, ..., m-1}; and then I used an std::map<int, std::pair<int, int>> to map the edge indices to pairs of vertex indices. This representation was closer to your first definition, which an std::map for ѱ. Note that I had to make a choice between mapping edge indices to pairs of vertex indices, or mapping pairs of vertex indices to edge indices. Had I felt the necessity, I could even have used both. The first definition doesn't care, because ѱ is a bijection, so mathematicians can use ѱ and its inverse function ѱ^-1 indifferently; but the data structure std::map is not a mathematical function, and inversing it might take time.
Conclusion
Both definitions are equivalent, and it really doesn't matter which one you use. But if you need to code algorithms using graphs, take some time to consider different representations of graphs and which one will make your algorithm the most efficient.

"sna" or "igraph" : Why do I get different degree values for undirected graph?

I am doing some basic network analysis using networks from the R package "networkdata". To this end, I use the package "igraph" as well as "sna". However, I realised that the results of descriptive network statistics vary depending on the package I use. Most variation is not too grave but the average degree of my undirected graph halved as soon as I switched from "sna" to "igraph".
library(networkdata)
n_1 <- covert_28
library(igraph)
library(sna)
n_1_adjmat <- as_adjacency_matrix(n_1)
n_1_adjmat2 <- as.matrix(n_1_adjmat)
mean(sna::degree(n_1_adjmat2, cmode = "freeman")) # [1] 23.33333
mean(igraph::degree(n_1, mode = "all")) # [1] 11.66667
This doesn't happen in case of my directed graph. Here, I get the same results regardless of using "sna" or "igraph".
Is there any explanation for this phenomenon? And if so, is there anything I can do in order to prevent this from happening?
Thank you in advance!
This is explained in the documentation for sna::degree.
indegree of a vertex, v, corresponds to the cardinality
of the vertex set N^+(v) = {i in V(G) : (i,v) in E(G)};
outdegree corresponds to the cardinality of the vertex
set N^-(v) = {i in V(G) : (v,i) in E(G)}; and total
(or “Freeman”) degree corresponds to |N^+(v)| + |N^-(v)|.
(Note that, for simple graphs,
indegree=outdegree=total degree/2.)
A simpler example than yours makes it clear.
library(igraph)
library(sna)
g = make_ring(3)
plot(g)
AM = as.matrix(as_adjacency_matrix(g))
sna::degree(AM)
[1] 4 4 4
igraph::degree(g)
[1] 2 2 2
Vertex 1 has links to both vertices 2 and 3. These count in the
in-degree and also count in the out-degree, so
Freeman = in + out = 2 + 2 = 4
The "Note" in the documentation states this.

graph visualization in R basis symmetric matrix having values in diagonal

I have a symmetric matrix which I modified a bit:
The above matrix is a symmetric matrix except the fact that I have added values in diagonal too (will tell the purpose going forward)
This matrix represents that how many times a person (A, B, C, D, E) works with other person on a publication. e.g. B and C worked 3 times together, similarly A and E worked 4 times together. Now the diagonal values represents how many times a person worked individually e.g. B worked on 4 publications (either alone or with someone else) similarly C worked on 3 publications.
Now I want to make a network analysis graph in R which describes relation between different person in terms of edge thickness and node size. e.g. the graph should look like this:
In graph, node circle size depends on number of publications a person worked on, e.g. circle B is largest as its diagonal value is maximum and A & E are smallest as they have lowest diagonal values. Also, the edge thickness between nodes depends on how many times they worked together, e.g. edge thickness between A & E is maximum as they worked 4 times together, compared to edge thickness (lesser than edge thickness between A & E) between B & C as they have worked 3 times together.
I can describe the relation between two persons basis edge thickness, however inclusion of diagonal values creating problems for me. Is it possible to do it in R? Any leads would be highly appreciated
You can do this with the igraph package. Because the diagonal means something different from the other entries in the matrix, I have separated the matrix into two pieces, the diagonal and the rest.
Your data
SM = as.matrix(read.table(text="A B C D E
1 2 1 1 4
2 4 3 2 1
1 3 3 1 2
1 2 1 2 1
4 1 2 1 1",
header=TRUE))
rownames(SM) = colnames(SM)
library(igraph)
AM = SM
diag(AM) = 0
D = diag(SM)
g = graph_from_adjacency_matrix(AM,
mode = "undirected",
weighted = TRUE)
plot(g,
edge.width=E(g)$weight,
vertex.size = 10+3*D)

How to turn data from R data frames into a network

Suppose I have the following data frames
df <- data.frame(dev = c("A","A","B","B","C","C","C"),
proj = c("W","X","Y","X","W","X","Z"))
types <- data.frame(proj = c("W","X","Y","Z"),
type = c("blue","orange","orange","blue"))
> df
dev proj
1 A W
2 A X
3 B Y
4 B X
5 C W
6 C X
7 C Z
> types
proj type
1 W blue
2 X orange
3 Y orange
4 Z blue
I would like to turn these into the following network
The nodes are the unique entries in proj. For nodes u,v, there is an arc from u to v if u and v share an element from dev. The data is a list of developers and projects that each developer has worked on, and I would like to form a network which connects projects that have a developer in common. Each project is of a particular type, and that information would need to be encoded in the graph (I did this in this toy example via colour).
From this graph what I need is the degree of each node, as well as one or more measures of centrality. In particular I need the closeness centrality of each node, as well as a modified version of closeness centrality which measures the centrality within each type. So my end goal is to obtain a table like this:
proj degree closeness_centrality type_centrality
W 2 0.75 1
X 3 1 1
Y 2 0.75 1
Z 1 0.60 1
For reference, the closeness centrality of a node u is defined as C(u)=(N-1)/(sum over all nodes v of the distance from u to v), where N is the number of nodes in the graph and the distance from u to v is the length of the shortest u-v-path. The type centrality is C(T,u)=|T-u|/(sum over all nodes v in T of the distance from u to v) where T is the set of all nodes of a given type, and |T-u| is the size of T with u excluded (so either |T| or |T|-1 depending on the type of u).
One of the big challenges is that my actual df has almost 300,000 rows and this graph will have around 155,000 vertices. The average degree will be very low though so I think that it is doable.
My questions are:
Is R the best tool to be using for this? Are there good packages for performing these types of calculations on graphs?
What is the best way to store this kind of data? Should I form an adjacency matrix, or something else?
Any insight or tips at all would be well appreciated; as an economics major I'm kind of in over my head comp-sci-wise here.
Thanks!

Given distances between n points, how to draw a map from those relationships

I came across an interesting question in my textbook, but not further answer or details were supplied :(
Given some points, A, B, C etc
and some distance relationships between those points:
A -> B = 23
A -> C = 45
B -> A = 23
B -> C = 78
C -> A = 45
C -> B = 78
So this distance between C and A is 45 units, A and B is 23 units etc
How to draw a map or some sort of representation? Is it just a case of constraining against those rules until you converge?
Since it is only 3 points, it is a simple triangle, and you know the distances of the three sides from the table: 23, 45, and 78 "units".
So you can plot any two of the points as a straight line, then do a little bit of math to determine the angle to the third point (and you already know the distance):
// a, b, and c are the distances, C is the angle.
c² = b² + a² - 2ba cosC
Solve that and you have the angle across point C so you can plot the third point.
Edit (I originally missed that this was for N points since it was only in the subject):.
If you don't have all of the distances, then you will have to find three that do have all three legs defined to use as a starting point and plot those. After that, find another point that has distances defined to two of your existing points and calculate your new triangle with those three points and plot that one. Repeat this until you run out of points.
I think multidimensional scaling is what you want. For example, given distances between U.S. cities, you'll get something like this:
There may not be a way to perfectly satisfy your constraints in 2- or 3-D, but this will minimize the cost function.

Resources