How to remove self-loops in Lightgraphs - julia

I am new to Julia and LightGraphs and I have been trying to find the most efficient way of detecting and removing self-loops. So far, the only way I have found is to iterate over all nodes in the Simplegraph, check whether it has a self-loop, and remove them. Is there any better way like using this combination in Python NetworkX: G.remove_edges_from(G.selfloop_edges())?
The way I am doing it right now:
path = adrs\to\my\edgeList
G = SimpleGraph(loadgraph(path, GraphIO.EdgeList.EdgeListFormat()))
for node in vertices(G)
if has_edge(G,node,node)
rem_edge!(G,node,node)
end
end

that's probably the best way to do it conditionally, but you can just call rem_edge!(G, node, node) without the has_edge() check - it returns a bool indicating whether the edge was removed so is safe to use if there's not an actual edge there.

You can find the vertices having the self loops with the command:
vxs = Iterators.flatten(simplecycles_limited_length(g,1))
To remove them just do:
rem_edge!.(Ref(g), vxs, vxs)

I did a quick benchmarking between my solution (without the has_edge(), thanks #sbromberger!) and the one proposed by #Przemyslaw (looks quite neat!). It seems my simple way of doing it is still the most efficient way, both in terms of memory and time. I was surprised to see the simplecycles_limited_length() does worse than the loop, considering the function seems to be for this specific purpose. If you know why that is, please let me know.
Here are my benchmarking results (my_graph has 22,470 nodes and 170,823 edges with 179 self-loops):
using BenchmarkTools
function sl1(G)
for node in vertices(G)
rem_edge!(G,node,node)
end
end
function sl2(G)
vxs = Iterators.flatten(simplecycles_limited_length(G,1))
rem_edge!.(Ref(G), vxs, vxs)
end
#benchmark sl1(my_graph)
>>> BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 554.401 μs (0.00% GC)
median time: 582.899 μs (0.00% GC)
mean time: 592.032 μs (0.00% GC)
maximum time: 1.292 ms (0.00% GC)
--------------
samples: 8440
evals/sample: 1
#benchmark sl1($my_graph)
>>> BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 555.500 μs (0.00% GC)
median time: 603.501 μs (0.00% GC)
mean time: 616.309 μs (0.00% GC)
maximum time: 1.281 ms (0.00% GC)
--------------
samples: 8108
evals/sample: 1
#benchmark sl2(my_graph)
>>> BenchmarkTools.Trial:
memory estimate: 448 bytes
allocs estimate: 6
--------------
minimum time: 792.400 μs (0.00% GC)
median time: 836.000 μs (0.00% GC)
mean time: 855.634 μs (0.00% GC)
maximum time: 1.836 ms (0.00% GC)
--------------
samples: 5839
evals/sample: 1
#benchmark sl2($my_graph)
>>> BenchmarkTools.Trial:
memory estimate: 448 bytes
allocs estimate: 6
--------------
minimum time: 795.600 μs (0.00% GC)
median time: 853.250 μs (0.00% GC)
mean time: 889.450 μs (0.00% GC)
maximum time: 2.022 ms (0.00% GC)
--------------
samples: 5618
evals/sample: 1
#btime sl1(my_graph)
>>> 555.999 μs (0 allocations: 0 bytes)
#btime sl1($my_graph)
>>> 564.000 μs (0 allocations: 0 bytes)
#btime sl2(my_graph)
>>> 781.800 μs (6 allocations: 448 bytes)
#btime sl2($my_graph)
>>> 802.200 μs (6 allocations: 448 bytes)
Edit: Added the interpolated benchmarks as requested.

Related

Efficient grid approximation of likelihood in Julia

A simple approach of approximating the maximum likelihood of a model given some data is grid approximation. For example, in R, we can generate a grid of parameter values and then evaluate the likelihood of each value given some data (example from Statistical Rethinking by McElreath):
p_grid <- seq(from=0, to=1, length.out=1000)
likelihood <- dbinom(6, size=9, prob=p_grid)
Here, likelihood is an array of 1000 values and I assume this is an efficient way to get such an array.
I am new to Julia (and not so good at R) so my approach of doing the same as above relies on comprehension syntax:
using Distributions
p_grid = collect(LinRange(0, 1, 1000))
likelihood = [pdf(Binomial(n9=, p=p), 6) for p in p_grid]
which is not only clunky but somehow seems inefficient because a new Binomial gets constructed 1000 times. Is there a better, perhaps vectorized, approach to accomplishing the same task?
In languages like R or Python, people often use the term "vectorization" to mean "avoid for loops in the language". I say "in the language" because there are still for loops, it's just that they're now in C instead of R/Python.
In Julia, there's nothing to worry about. You'll still sometimes hear "vectorization", but Julia folks tend to use this in the original sense of hardware vectorization. More on that here.
As for your code, I think it's fine. To be sure, let's benchmark!
julia> using BenchmarkTools
julia> #btime [pdf(Binomial(9, p), 6) for p in $p_grid]
111.352 μs (1 allocation: 7.94 KiB)
Another way you could write this is using map:
julia> #btime map($p_grid) do p
pdf(Binomial(9, p), 6)
end;
111.623 μs (1 allocation: 7.94 KiB)
To check for construction overhead, you could make lower-level calls to StatsFuns, like this
julia> using StatsFuns
julia> #btime map($p_grid) do p
binompdf(9, p, 6)
end;
109.809 μs (1 allocation: 7.94 KiB)
It looks like there some difference, but it's pretty minor, maybe around 2% of the overall cost.

How powrmetrisc works?

Apple seems to have upgraded the output of powermetrics on M1 CPUs to include reports of consumed power. Output looks roughly like this:
sudo powermetrics | grep -i power
....
E-Cluster Power: 230 mW
P0-Cluster Power: 3475 mW
P1-Cluster Power: 268 mW
ANE Power: 0 mW
DRAM Power: 1037 mW
CPU Power: 3973 mW
GPU Power: 125 mW
Package Power: 7348 mW
GPU Power: 125 mW
Is any of these reported powers actually measured, e.g. measured off of any of the voltage regulators? Or is it a case of a table look up, e.g. having a pre-characterized table of estimated power based on CPU/GPU core workloads the OS just returns a value?
What is included is Package Power? I would have expected that the sum NE+CPU+GPU+DRAM to be close to the total Package Power. Os that difference is caused by power for all the glue logic surrounding the CPU/GPU/NE and all IOs on the M1?

How much faster is Eigen for small fixed size matrices?

I'm using Julia at the moment but I have a performance critical function which requires an enormous amount of repeated matrix operations on small fixed size matrices (3 dimensional or 4 dimensional). It seems that all the matrix operations in Julia are handled by a BLAS and LAPACK back end. It also appears theres a lot of memory allocation going on within some of these functions.
There is a julia library for small matrices which boasts impressive speedups for 3x3 matrices, but it has not been updated in 3 years. I am considering rewriting my performance critical function in Eigen
I know that Eigen claims to be really good for fixed size matrices, but I am still trying to judge whether I should rewrite this function in Eigen or not. The performance benchmarks are for dynamic sized matrices. Does anyone have any data to suggest how much performance one gets from the fixed size matrices? The types of operations I'm doing are matrix x matrix, matrix x vector, positive definite linear solves.
If you want fast operations for small matrices, I highly recommend StaticArrays. For example (NOTE: this was originally written before the BenchmarkTools package, which is now recommended):
using StaticArrays
using LinearAlgebra
function foo(A, b, n)
s = 0.0
for i = 1:n
s += sum(A*b)
end
s
end
function foo2(A, b, n)
c = A*b
s = 0.0
for i = 1:n
mul!(c, A, b)
s += sum(c)
end
s
end
A = rand(3,3)
b = rand(3)
Af = SMatrix{3,3}(A)
bf = SVector{3}(b)
foo(A, b, 1)
foo2(A, b, 1)
foo(Af, bf, 1)
#time foo(A, b, 10^6)
#time foo2(A, b, 10^6)
#time foo(Af, bf, 10^6)
Results:
julia> include("/tmp/foo.jl")
0.080535 seconds (1.00 M allocations: 106.812 MiB, 14.86% gc time)
0.064963 seconds (3 allocations: 144 bytes)
0.001719 seconds (2 allocations: 32 bytes)
foo2 tries to be clever and avoid memory allocation, yet it's simply blown away by the naive implementation when using StaticArrays.

Calculating optimal window size

I'm currently trying to calculate the optimal window size. I got these variables:
Propagation delay: 10 ms
Bit speed: 100 kbps
Frame size: 20 bytes
When the propagation delay is 10 ms we get a limit when the window size is 13 and when the propagation delay is 20 ms we get a limit when we have a window size of 24.
Is there any formula to calculate the maximum window size?
The formula to your Question is:
(Bitspeed*2*tp)/buffer*8 = windowsize
Where:
Bitspeed = 100 in your case
2*tp = RTT (The time it takes to send and return a package), which in your case is 20
And buffer = 20, 20*8 to get the bitsize
Windowsize = the thing you want calculated
Hope I was helpful!
Bandwidth times delay. It's called the bandwidth-delay product.

Calculating MBPS - Average

I have found below calculation from http://www.gridsouth.com/services/colocation/basics/bandwidth
I have no idea how they came up with final number 1.395 mbps. Can you please help me with the formula used in below example?
If your network provider bills you on average usage, let's say they sample your MBPS usage 100 times in one month (typically it would be more like every 5 minutes) and of those samples your network usage was measured as follows: 20 times: .1 MBPS, 30 times .1.5 MBPS, 30 times 1.8 MBPS, 15 times 1.9 MBPS and 5 times at 2 MBPS. If you average all these samples you would be billed at whatever the price is in your contract for 1.395 MBPS bandwidth.
(20*0.1 + 30*1.5 + ... + 5*2)/100 = 1.395
Looks like the standard definition for a (weighted) average...

Resources