prim's algorithm in julia with adjacency matrix - julia

im trying to implement the prim algorithm in julia.
My function gets the adjacency matrix with the weights but isnt working correctly. I dont know what i have to change. There is some problem with the append!() function i guess.
Is there maybe another/better way to implement the algorithm by just passing the adjacency matrix?
Thank you.
function prims(AD)
n = size(AD)
n1 = n[1]
# choose initial vertex from graph
vertex = 1
# initialize empty edges array and empty MST
MST = []
edges = []
visited = []
minEdge = [nothing, nothing, float(Inf)]
# run prims algorithm until we create an MST
# that contains every vertex from the graph
while length(MST) != n1 - 1
# mark this vertex as visited
append!(visited, vertex)
# add each edge to list of potential edges
for r in 1:n1
if AD[vertex][r] != 0
append!(edges, [vertex, r, AD[vertex][r]])
end
end
# find edge with the smallest weight to a vertex
# that has not yet been visited
for e in 1:length(edges)
if edges[e][3] < minEdge[3] && edges[e][2] not in visited
minEdge = edges[e]
end
end
# remove min weight edge from list of edges
deleteat!(edges, minEdge)
# push min edge to MST
append!(MST, minEdge)
# start at new vertex and reset min edge
vertex = minEdge[2]
minEdge = [nothing, nothing, float(Inf)]
end
return MST
end
For example. When im trying the algorithm with this adjacency Matrix
C = [0 2 3 0 0 0; 2 0 5 3 4 0; 3 5 0 0 4 0; 0 3 0 0 2 3; 0 4 4 2 0 5; 0 0 0 3 5 0]
i get
ERROR: BoundsError
Stacktrace:
[1] getindex(::Int64, ::Int64) at .\number.jl:78
[2] prims(::Array{Int64,2}) at .\untitled-8b8d609f2ac8a0848a18622e46d9d721:70
[3] top-level scope at none:0
I guess i have to "reshape" my Matrix C in a form like this
D = [ [0,2,3,0,0,0], [2,0,5,3,4,0], [3,5,0,0,4,0], [0,3,0,0,2,3], [0,4,4,2,0,5], [0,0,0,3,5,0
]]
But also with this i get the same Error.

First note that LightGraphs.jl has this algorithm implemented. You can find the code here.
I have also made some notes on your algorithm to make it working and indicate the potential improvements without changing its general structure:
using DataStructures # library defining PriorityQueue type
function prims(AD::Matrix{T}) where {T<:Real} # make sure we get what we expect
# make sure the matrix is symmetric
#assert transpose(AD) == AD
n1 = size(AD, 1)
vertex = 1
# it is better to keep edge information as Tuple rather than a vector
# also note that I add type annotation for the collection to improve speed
# and make sure that the stored edge weight has a correct type
MST = Tuple{Int, Int, T}[]
# using PriorityQueue makes the lookup of the shortest edge fast
edges = PriorityQueue{Tuple{Int, Int, T}, T}()
# we will eventually visit almost all vertices so we can use indicator vector
visited = falses(n1)
while length(MST) != n1 - 1
visited[vertex] = true
for r in 1:n1
# you access a matrix by passing indices separated by a comma
dist = AD[vertex, r]
# no need to add edges to vertices that were already visited
if dist != 0 && !visited[r]
edges[(vertex, r, dist)] = dist
end
end
# we will iterate till we find an unvisited destination vertex
while true
# handle the case if the graph was not connected
isempty(edges) && return nothing
minEdge = dequeue!(edges)
if !visited[minEdge[2]]
# use push! instead of append!
push!(MST, minEdge)
vertex = minEdge[2]
break
end
end
end
return MST
end

Related

Max Recursion Depth Error With Grid Search Problem

I've written out a potential solution to a Leetcode problem but I get this error involving maximum recursion depth. I'm really unsure what I'm doing wrong. Here's what I've tried writing:
def orangesRotting(grid):
R,C = len(grid), len(grid[0])
seen = set()
min_time = 0
def fresh_search(r,c, time):
if ((r,c,time) in seen or r < 0 or c < 0 or r >= R or c >= C or grid[r][c] == 0):
return
elif grid[r][c] == 2:
seen.add((r,c,0))
elif grid[r][c] == 1:
seen.add((r,c, time + 1))
fresh_search(r+1,c,time+1)
fresh_search(r-1,c,time+1)
fresh_search(r,c+1,time+1)
fresh_search(r,c-1,time+1)
for i in range(R):
for j in range(C):
if grid[i][j] == 2:
fresh_search(i,j,0)
for _,_,t in list(seen):
min_time = max(min_time,t)
return min_time
Even on a simple input like grid = [[2,1,1], [1,1,0], [0,1,1]]. The offending line always appears to be at the if statement
if ((r,c,time) in seen or r < 0 or c < 0 or r >= R or c >= C or grid[r][c] == 0):
Please note, I'm not looking for help in solving the problem, just understanding why I'm running into this massive recursion issue. For reference, here is the link to the problem. Any help would be appreciated.
So let's trace through what you are doing here. You iterate through the entire grid and if the value for that cell is 2 you call fresh_search for that cell. We'll start with [0,0]
In fresh_search you then add the cell with times seen = 0 to your set.
Now for all neighboring cells you call fresh_search so we'll just look at r+1. For r+1 your method fresh_search adds the cell to your set with times seen = 1 and then calls fresh_search again with all neighboring cells.
Next we'll just look at r-1 which is our origin and now fresh_search is being called with this cell and times seen = 2. Now this value isn't in the set yet because (0,0,0) != (0,0,2) so it adds it to the set and again calls fresh_search with the r+1 cell but now times seen = 3
and so on and so forth until max recursion.

All path *lengths* from source to target in Directed Acyclic Graph

I have a graph with an adjacency matrix shape (adj_mat.shape = (4000, 4000)). My current problem involves finding the list of path lengths (the sequence of nodes is not so important) that traverses from the source (row = 0 ) to the target (col = trans_mat.shape[0] -1).
I am not interested in finding the path sequences; I am only interested in propagating the path length. As a result, this is different from finding all simple paths - which would be too slow (ie. find all paths from source to target; then score each path). Is there a performant way to do this quickly?
DFS is suggested as one possible strategy (noted here). My current implementation (below) is simply not optimal:
# create graph
G = nx.from_numpy_matrix(adj_mat, create_using=nx.DiGraph())
# initialize nodes
for node in G.nodes:
G.nodes[node]['cprob'] = []
# set starting node value
G.nodes[0]['cprob'] = [0]
def propagate_prob(G, node):
# find incoming edges to node
predecessors = list(G.predecessors(node))
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = G.get_edge_data(prev_node, node)['weight']
# get predecessor node value
if len(G.nodes[prev_node]['cprob']) == 0:
G.nodes[prev_node]['cprob'] = propagate_prob(G, prev_node)
prev_node_arr = G.nodes[prev_node]['cprob']
# add incoming edge weight to prev_node arr
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(prev_node_arr)])
# update current node array
G.nodes[node]['cprob'] = curr_node_arr
return G.nodes[node]['cprob']
# calculate all path lengths from source to sink
part_func = propagate_prob(G, 4000)
I don't have a large example by hand (e.g. >300 nodes), but I found a non recursive solution:
import networkx as nx
g = nx.DiGraph()
nx.add_path(g, range(7))
g.add_edge(0, 3)
g.add_edge(0, 5)
g.add_edge(1, 4)
g.add_edge(3, 6)
# first step retrieve topological sorting
sorted_nodes = nx.algorithms.topological_sort(g)
start = 0
target = 6
path_lengths = {start: [0]}
for node in sorted_nodes:
if node == target:
print(path_lengths[node])
break
if node not in path_lengths or g.out_degree(node) == 0:
continue
new_path_length = path_lengths[node]
new_path_length = [i + 1 for i in new_path_length]
for successor in g.successors(node):
if successor in path_lengths:
path_lengths[successor].extend(new_path_length)
else:
path_lengths[successor] = new_path_length.copy()
if node != target:
del path_lengths[node]
Output: [2, 4, 2, 4, 4, 6]
If you are only interested in the number of paths with different length, e.g. {2:2, 4:3, 6:1} for above example, you could even reduce the lists to dicts.
Background
Some explanation what I'm doing (and I hope works for larger examples as well). First step is to retrieve the topological sorting. Why? Then I know in which "direction" the edges flow and I can simply process the nodes in that order without "missing any edge" or any "backtracking" like in a recursive variant. Afterwards, I initialise the start node with a list containing the current path length ([0]). This list is copied to all successors, while updating the path length (all elements +1). The goal is that in each iteration the path length from the starting node to all processed nodes is calculated and stored in the dict path_lengths. The loop stops after reaching the target-node.
With igraph I can calculate up to 300 nodes in ~ 1 second. I also found that accessing the adjacency matrix itself (rather than calling functions of igraph to retrieve edges/vertices) also saves time. The two key bottlenecks are 1) appending a long list in an efficient manner (while also keeping memory) 2) finding a way to parallelize. This time grows exponentially past ~300 nodes, I would love to see if someone has a faster solution (while also fitting into memory).
import igraph
# create graph from adjacency matrix
G = igraph.Graph.Adjacency((trans_mat_pad > 0).tolist())
# add edge weights
G.es['weight'] = trans_mat_pad[trans_mat_pad.nonzero()]
# initialize nodes
for node in range(trans_mat_pad.shape[0]):
G.vs[node]['cprob'] = []
# set starting node value
G.vs[0]['cprob'] = [0]
def propagate_prob(G, node, trans_mat_pad):
# find incoming edges to node
predecessors = trans_mat_pad[:, node].nonzero()[0] # G.get_adjlist(mode='IN')[node]
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = trans_mat_pad[prev_node, node] # G.es[prev_node]['weight']
# get predecessor node value
if len(G.vs[prev_node]['cprob']) == 0:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + propagate_prob(G, prev_node, trans_mat_pad)])
else:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(G.vs[prev_node]['cprob'])])
## NB: If memory constraint, uncomment below
# set max size
# if len(curr_node_arr) > 100:
# curr_node_arr = np.sort(curr_node_arr)[:100]
# update current node array
G.vs[node]['cprob'] = curr_node_arr
return G.vs[node]['cprob']
# calculate path lengths
path_len = propagate_prob(G, trans_mat_pad.shape[0]-1, trans_mat_pad)

How do I finish a recursive binary search in fortran?

I am trying to find the smallest index containing the value i in a sorted array. If this i value is not present I want -1 to be returned. I am using a binary search recursive subroutine. The problem is that I can't really stop this recursion and I get lot of answers(one right and the rest wrong). And sometimes I get an error called "segmentation fault: 11" and I don't really get any results.
I've tried to delete this call random_number since I already have a sorted array in my main program, but it did not work.
program main
implicit none
integer, allocatable :: A(:)
real :: MAX_VALUE
integer :: i,j,n,s, low, high
real :: x
N= 10 !size of table
MAX_VALUE = 10
allocate(A(n))
s = 5 ! searched value
low = 1 ! lower limit
high = n ! highest limit
!generate random table of numbers (from 0 to 1000)
call Random_Seed
do i=1, N
call Random_Number(x) !returns random x >= 0 and <1
A(i)= anint(MAX_VALUE*x)
end do
call bubble(n,a)
print *,' '
write(*,10) (a(i),i=1,N)
10 format(10i6)
call bsearch(A,n,s,low,high)
deallocate(A)
end program main
The sort subroutine:
subroutine sort(p,q)
implicit none
integer(kind=4), intent(inout) :: p, q
integer(kind=4) :: temp
if (p>q) then
temp = p
p = q
q = temp
end if
return
end subroutine sort
The bubble subroutine:
subroutine bubble(n,arr)
implicit none
integer(kind=4), intent(inout) :: n
integer(kind=4), intent(inout) :: arr(n)
integer(kind=4) :: sorted(n)
integer :: i,j
do i=1, n
do j=n, i+1, -1
call sort(arr(j-1), arr(j))
end do
end do
return
end subroutine bubble
recursive subroutine bsearch(b,n,i,low,high)
implicit none
integer(kind=4) :: b(n)
integer(kind=4) :: low, high
integer(kind=4) :: i,j,x,idx,n
real(kind=4) :: r
idx = -1
call random_Number(r)
x = low + anint((high - low)*r)
if (b(x).lt.i) then
low = x + 1
call bsearch(b,n,i,low,high)
else if (b(x).gt.i) then
high = x - 1
call bsearch(b,n,i,low,high)
else
do j = low, high
if (b(j).eq.i) then
idx = j
exit
end if
end do
end if
! Stop if high = low
if (low.eq.high) then
return
end if
print*, i, 'found at index ', idx
return
end subroutine bsearch
The goal is to get the same results as my linear search. But I'am getting either of these answers.
Sorted table:
1 1 2 4 5 5 6 7 8 10
5 found at index 5
5 found at index -1
5 found at index -1
or if the value is not found
2 2 3 4 4 6 6 7 8 8
Segmentation fault: 11
There are a two issues causing your recursive search routine bsearch to either stop with unwanted output, or result in a segmentation fault. Simply following the execution logic of your program at the hand of the examples you provided, elucidate the matter:
1) value present and found, unwanted output
First, consider the first example where array b contains the value i=5 you are searching for (value and index pointed out with || in the first two lines of the code block below). Using the notation Rn to indicate the the n'th level of recursion, L and H for the lower- and upper bounds and x for the current index estimate, a given run of your code could look something like this:
b(x): 1 1 2 4 |5| 5 6 7 8 10
x: 1 2 3 4 |5| 6 7 8 9 10
R0: L x H
R1: Lx H
R2: L x H
5 found at index 5
5 found at index -1
5 found at index -1
In R0 and R1, the tests b(x).lt.i and b(x).gt.i in bsearch work as intended to reduce the search interval. In R2 the do-loop in the else branch is executed, idx is assigned the correct value and this is printed - as intended. However, a return statement is now encountered which will return control to the calling program unit - in this case that is first R1(!) where execution will resume after the if-else if-else block, thus printing a message to screen with the initial value of idx=-1. The same happens upon returning from R0 to the main program. This explains the (unwanted) output you see.
2) value not present, segmentation fault
Secondly, consider the example resulting in a segmentation fault. Using the same notation as before, a possible run could look like this:
b(x): 2 2 3 4 4 6 6 7 8 8
x: 1 2 3 4 5 6 7 8 9 10
R0: L x H
R1: L x H
R2: L x H
R3: LxH
R4: H xL
.
.
.
Segmentation fault: 11
In R0 to R2 the search interval is again reduced as intended. However, in R3 the logic fails. Since the search value i is not present in array b, one of the .lt. or .gt. tests will always evaluate to .true., meaning that the test for low .eq. high to terminate a search is never reached. From this point onwards, the logic is no longer valid (e.g. high can be smaller than low) and the code will continue deepening the level of recursion until the call stack gets too big and a segmentation fault occurs.
These explained the main logical flaws in the code. A possible inefficiency is the use of a do-loop to find the lowest index containing a searched for value. Consider a case where the value you are looking for is e.g. i=8, and that it appears in the last position in your array, as below. Assume further that by chance, the first guess for its position is x = high. This implies that your code will immediately branch to the do-loop, where in effect a linear search is done of very nearly the entire array, to find the final result idx=9. Although correct, the intended binary search rather becomes a linear search, which could result in reduced performance.
b(x): 2 2 3 4 4 6 6 7 |8| 8
x: 1 2 3 4 5 6 7 8 |9| 10
R0: L xH
8 found at index 9
Fixing the problems
At the very least, you should move the low .eq. high test to the start of the bsearch routine, so that recursion stops before invalid bounds can be defined (you then need an additional test to see if the search value was found or not). Also, notify about a successful search right after it occurs, i.e. after the equality test in your do-loop, or the additional test just mentioned. This still does not address the inefficiency of a possible linear search.
All taken into account, you are probably better off reading up on algorithms for finding a "leftmost" index (e.g. on Wikipedia or look at a tried and tested implementation - both examples here use iteration instead of recursion, perhaps another improvement, but the same principles apply) and adapt that to Fortran, which could look something like this (only showing new code, ...refer to existing code in your examples):
module mod_search
implicit none
contains
! Function that uses recursive binary search to look for `key` in an
! ordered `array`. Returns the array index of the leftmost occurrence
! of `key` if present in `array`, and -1 otherwise
function search_ordered (array, key) result (idx)
integer, intent(in) :: array(:)
integer, intent(in) :: key
integer :: idx
! find left most array index that could possibly hold `key`
idx = binary_search_left(1, size(array))
! if `key` is not found, return -1
if (array(idx) /= key) then
idx = -1
end if
contains
! function for recursive reduction of search interval
recursive function binary_search_left(low, high) result(idx)
integer, intent(in) :: low, high
integer :: idx
real :: r
if (high <= low ) then
! found lowest possible index where target could be
idx = low
else
! new guess
call random_number(r)
idx = low + floor((high - low)*r)
! alternative: idx = low + (high - low) / 2
if (array(idx) < key) then
! continue looking to the right of current guess
idx = binary_search_left(idx + 1, high)
else
! continue looking to the left of current guess (inclusive)
idx = binary_search_left(low, idx)
end if
end if
end function binary_search_left
end function search_ordered
! Move your routines into a module
subroutine sort(p,q)
...
end subroutine sort
subroutine bubble(n,arr)
...
end subroutine bubble
end module mod_search
! your main program
program main
use mod_search, only : search_ordered, sort, bubble ! <---- use routines from module like so
implicit none
...
! Replace your call to bsearch() with the following:
! call bsearch(A,n,s,low,high)
i = search_ordered(A, s)
if (i /= -1) then
print *, s, 'found at index ', i
else
print *, s, 'not found!'
end if
...
end program main
Finally, depending on your actual use case, you could also just consider using the Fortran intrinsic procedure minloc saving you the trouble of implementing all this functionality yourself. In this case, it can be done by making the following modification in your main program:
! i = search_ordered(a, s) ! <---- comment out this line
j = minloc(abs(a-s), dim=1) ! <---- replace with these two
i = merge(j, -1, a(j) == s)
where j returned from minloc will be the lowest index in the array a where s may be found, and merge is used to return j when a(j) == s and -1 otherwise.

Using method source(edge) of Package Graphs.jl in Julia on Juliabox.org

Take a look at the following simple code example:
Pkg.add("Graphs")
using Graphs
gd = simple_graph(20, is_directed=true) # directed graph with 20 nodes
nodeTo = 2
for nodeFrom in vertices(gd) # add some edges...
if(nodeTo != 20)
add_edge!(gd, nodeFrom, nodeTo)
nodeTo +=1
end
end
for i in edges(gd) # Print source and target for every edge in gd
println("Target: ",target(i))
println("Source: ", source(i))
end
So it works sometimes, and it prints the source and targets of the edges, but most times running the cell(after programming in this or other cells or doing nothing) i get the following error:
type: anonymous: in apply, expected Function, got Int64
while loading In[11], in expression starting on line 14
in anonymous at no file:16
I have not change any code concerning the method nether the cell, but it doesnt work anymore. The method target(edge) works fine, but the method source(edge) makes problems the most times.
http://graphsjl-docs.readthedocs.org/en/latest/graphs.html#graph
What should i do? I would be pleased to get some help.
After some thoughts, i found out, that the mistake have to be in the code between the hashtags:
Pkg.add("JuMP")
Pkg.add("Cbc")
# Pkg.update()
using Graphs
using JuMP
using Cbc
function createModel(graph, costs, realConnections)
m = Model(solver = CbcSolver())
#defVar(m, 0 <= x[i=1:realConnections] <= 1, Int)
#setObjective(m, Min, dot(x,costs[i=1:realConnections]))
println(m)
for vertex in vertices(graph)
edgesIn = Int64[] # Array of edges going in the vertex
edgesOut = Int64[] # Array of Edges going out of the vertex
for edge in edges(graph)
if (target(edge) == vertex) # works fine
push!(edgesIn, edge_index(edge))
end
if (source(edge) == vertex) # does not work
push!(edgesOut, edge_index(edge))
print(source(edge), " ")
end
end
# #addConstraint(m, sum{x[edgesIn[i]], i=1:length(edgesIn)} - sum{x[edgesOut[j]], j=1:length(edgesOut)} == 0)
end
return m
end
file = open("csp50.txt")
g = createGraph(file) # g = g[1] = simpleGraph Object with 50 nodes and 173 arccs, g[2] = number of Nodes g[3]/g[4] = start/end time g[5] = costs of each arc
# After running this piece of code, the source(edge) method does not work anymore
########################################################################################
# adding a source and sink node and adding edges between every node of the orginal graph and the source and sink node
realConnections = length(g[5]) # speichern der Kanten
source = (num_vertices(g[1])+1)
sink = (num_vertices(g[1])+2)
add_vertex!(g[1], source)
add_vertex!(g[1], sink)
push!(g[3], 0)
push!(g[3], 0)
push!(g[4], 0)
push!(g[4], 0)
for i in vertices(g[1])
if (i != source)
add_edge!(g[1], source, i) # edge from source to i
push!(g[5], 0)
end
if (i != sink)
add_edge!(g[1], i, sink) # Kante von i zu Senke
push!(g[5], 0) # Keine Kosten/Zeit fuer diese Kante
end
end
######################################################################################
numEdges = num_edges(g[1]);
createModel(g[1], g[5], realConnections)
From Julia's Manual:
Julia will even let you redefine built-in constants and functions if
needed:
julia> pi
π = 3.1415926535897...
julia> pi = 3
Warning: imported binding for pi overwritten in module Main
3
julia> pi
3
julia> sqrt(100)
10.0
julia> sqrt = 4
Warning: imported binding for sqrt overwritten in module Main
4
However, this is obviously not recommended to avoid potential
confusion.
So reusing source as a variable "unbound" it from it's function definition. Using a different variable name should preserve the Graphs.jl definition for it.

Graph branch decomposition

Hello,
I would like to know about an algorithm to produce a graph decomposition into branches with rank in the following way:
Rank | path (or tree branch)
0 1-2
1 2-3-4-5-6
1 2-7
2 7-8
2 7-9
The node 1 would be the Root node and the nodes 6, 8 and 9 would be the end nodes.
the rank of a branch should be given by the number of bifurcation nodes up to the root node. Let's assume that the graph has no loops (But I'd like to have no such constraint)
I am electrical engineer, and perhaps this is a very standard problem, but so far I have only found the BFS algorithm to get the paths, and all the cut sets stuff. I also don't know if this applies.
I hope that my question is clear enough.
PS: should this question be in stack overflow?
From your example, I'm making some assumptions:
You want to bifurcate whenever a node's degree is > 2
Your input graph is acyclic
With an augmented BFS this is possible from the root r. The following will generate comp_groups, which will be a list of components (each of which is a list of its member vertices). The rank of each component will be under the same index in the list rank.
comp[1..n] = -1 // init all vertices to belong to no components
comp[r] = 0 // r is part of component 0
comp_groups = [[r]] // a list of lists, with the start of component 0
rank[0] = 0 // component 0 (contains root) has rank 0
next_comp_id = 1
queue = {r} // queues for BFS
next_queue = {}
while !queue.empty()
for v in queue
for u in neighbors(v)
if comp[u] == -1 // test if u is unvisited
if degree(v) > 2
comp[u] = next_comp_id // start a new component
next_comp_id += 1
rank[comp[u]] = rank[comp[v]] + 1 // new comp's rank is +1
comp_groups[comp[u]] += [v] // add v to the new component
else
comp[u] = comp[v] // use same component
comp_group[comp[u]] += [u] // add u to the component
next_queue += {u} // add u to next frontier
queue = next_queue // move on to next frontier
next_queue = {}

Resources