I am working with very large graphs and their corresponding weighted adjacency matrices, and I need to take these large matrices to similarly large powers (i.e. raising matrices to the power of tens of thousands).
The issue I have run into is that elements of the matrix quickly become too large for the computer to handle, and I am wondering how to get around this problem.
Has anyone worked with such problems before(raising matrices to large powers), and how did you resolve them?
I know Python's numpy can handle these computations. Is there an analogous library in Julia that can do this as well?
You could do a transfomation of type to BigFloat:
julia> A = [1.5 2 -4; 3 -1 -6; -10 2.3 4]
3×3 Array{Float64,2}:
1.5 2.0 -4.0
3.0 -1.0 -6.0
-10.0 2.3 4.0
julia> (BigFloat.(A))^32000
3×3 Array{BigFloat,2}:
4.16164e+31019 8.71351e+31017 -3.22788e+31019
4.60207e+31019 9.63565e+31017 -3.56949e+31019
-5.83403e+31019 -1.22151e+31018 4.52503e+31019
Related
What is the best way in Julia to vectorize a function along a specific axis? For example sum up all the rows of a matrix. Is it possible with the dot notation?
sum.(ones(4,4))
Does not yield the desired result.
Try using the dims argument on a lot of functions that deal with sets of values.
sum([1 2; 3 4], dims=2)
2×1 Matrix{Int64}:
3
7
# or
using Statistics
mean([1 2; 3 4], dims=1)
1×2 Matrix{Float64}:
2.0 3.0
There is already a standard function called mapslices, looks like exactly what you need.
julia> mapslices(sum, ones(4, 4), dims = 2)
4-element Vector{Float64}:
4.0
4.0
4.0
4.0
You can find the documentation here or by typing ? followed by mapslices in REPL.
If in your example you want to use the dot notation you should pass an array of rows, not the array itself. Otherwise, sum is applied to each element resulting in the same matrix. It can be done with eachrow and eachcol for rows and columns respectively.
julia> sum.(eachrow(ones(4, 4)))
4-element Vector{Float64}:
4.0
4.0
4.0
4.0
EDIT: I tried to suggest a more general solution, but if you have this option I would recommend using Andre's answer.
I want to turn an array like this
[1,2,3,4,5]
into a lagged version
[missing,1,2,3,4] # lag 1
[missing,missing,1,2,3] # lag 2
or a led version
[2,3,4,5,missing] # lead 1
[3,4,5,missing,missing] # lead 2
As Julia is designed for scientific computing, there must be something like this, right?
Add ShiftedArrays. See: https://discourse.julialang.org/t/ann-shiftedarrays-and-support-for-shiftedarrays-in-groupederrors/9162
Quoting from the above:
lag, lead functions, to shift an array and add missing (or a custom default value in the latest not yet released version) where the data is not available, or circshift for shifting circularly in a lazy (non allocating) way:
julia> v = [1.2, 2.3, 3.4]
3-element Array{Float64,1}:
1.2
2.3
3.4
julia> lag(v)
3-element ShiftedArrays.ShiftedArray{Float64,Missings.Missing,1,Array{Float64,1}}:
missing
1.2
2.3
Note the ShiftedArray version of lag keeps the array size the same. You might add a short function to make it behave the way you asked:
biglag(v, n) = lag(vcat(v, v[1:n]), n)
I picked up Julia to do some numerical analysis stuff and was trying to implement a full pivot LU decomposition (as in, trying to get an LU decomposition that is as stable as possible). I thought that the best way of doing so was finding the maximum value for each column and then resorting the columns in descending order of their maximum values.
Is there a way of avoiding swapping every element of two columns and instead doing something like changing two references/pointers?
Following up on #longemen3000's answer, you can use views to swap columns. For example:
julia> A = reshape(1:12, 3, 4)
3×4 reshape(::UnitRange{Int64}, 3, 4) with eltype Int64:
1 4 7 10
2 5 8 11
3 6 9 12
julia> V = view(A, :, [3,2,4,1])
3×4 view(reshape(::UnitRange{Int64}, 3, 4), :, [3, 2, 4, 1]) with eltype Int64:
7 4 10 1
8 5 11 2
9 6 12 3
That said, whether this is a good strategy depends on access patterns. If you'll use elements of V once or a few times, this view strategy is a good one. In contrast, if you access elements of V many times, you may be better off making a copy or moving values in-place, since that's a price you pay once whereas here you pay an indirection cost every time you access a value.
Just for "completeness", in case you actually want to swap columns in-place,
function swapcols!(X::AbstractMatrix, i::Integer, j::Integer)
#inbounds for k = 1:size(X,1)
X[k,i], X[k,j] = X[k,j], X[k,i]
end
end
is simple and fast.
In fact, in an individual benchmark for small matrices this is even faster than the view approach mentioned in the other answers (views aren't always free):
julia> A = rand(1:10,4,4);
julia> #btime view($A, :, $([3,2,1,4]));
31.919 ns (3 allocations: 112 bytes)
julia> #btime swapcols!($A, 1,3);
8.107 ns (0 allocations: 0 bytes)
in julia there is the #view macro, that allows you to create an array that is just a reference to another array, for example:
A = [1 2;3 4]
Aview = #view A[:,1] #view of the first column
Aview[1,1] = 10
julia> A
2×2 Array{Int64,2}:
10 2
3 4
with that said, when working with concrete number types (Float64,Int64,etc), julia uses contiguous blocks of memory with the direct representation of the number type. that is, a julia array of numbers is not an array of pointers were each element of an array is a pointer to a value. if the values of an array can be represented by a concrete binary representation (an array of structs, for example) then an array of pointers is used.
I'm not a computer science expert, but i observed that is better to have your data tightly packed that using a lot of pointers when doing number crunching.
Another different case is Sparse Arrays. the basic julia representation of an sparse array is an array of indices and an array of values. here you can simply swap the indices instead of copying the values
I have seen online in a few places the solution
a = [1 2 3; 4 5 Inf]
a[isinf(a)] = NaN
But this gives me an error on Julia 1.0.1:
ERROR: MethodError: no method matching isinf(::Array{Float64,2})
Closest candidates are:
isinf(::BigFloat) at mpfr.jl:851
isinf(::Missing) at missing.jl:79
isinf(::ForwardDiff.Dual) at <path on my local machine>
What gives?
As an additional comment. A standard function to perform this action is replace!. You can use it like this:
julia> a = [1 2 3; 4 5 Inf]
2×3 Array{Float64,2}:
1.0 2.0 3.0
4.0 5.0 Inf
julia> replace!(a, Inf=>NaN)
2×3 Array{Float64,2}:
1.0 2.0 3.0
4.0 5.0 NaN
It will perform better than broadcasting for large arrays.
If you really need speed you can write a simple function like this:
function inf2nan(x)
for i in eachindex(x)
#inbounds x[i] = ifelse(isinf(x[i]), NaN, x[i])
end
end
Now let us simply compare the performance of the three options:
julia> function bench()
x = fill(Inf, 10^8)
#time x[isinf.(x)] .= NaN
x = fill(Inf, 10^8)
#time replace!(x, Inf=>NaN)
x = fill(Inf, 10^8)
#time inf2nan(x)
end
bench (generic function with 1 method)
julia> bench()
0.980434 seconds (9 allocations: 774.865 MiB, 0.16% gc time)
0.183578 seconds
0.109929 seconds
julia> bench()
0.971408 seconds (9 allocations: 774.865 MiB, 0.03% gc time)
0.184163 seconds
0.102161 seconds
EDIT: For the most performant approaches to this problem see the excellent answer of #BogumilKaminski. This answer addresses the more general question of why isinf and related functions do not work on arrays anymore.
You are running into the more general issue that lots of functions that worked on arrays pre-v1.0 no longer work on arrays in v1.0 because you are supposed to be using broadcasting. The correct solution for v1.0 is:
a[isinf.(a)] .= NaN
I'm actually broadcasting in two places here. Firstly, we broadcast isinf over the array a, but we are also broadcasting the scalar NaN on the RHS to all indexed locations in the array on the LHS via .=. In general, the dot broadcasting notation is incredibly flexible and performant, and one of my favorite features of the latest iteration of Julia.
You are passing your entire array to isinf, it doesn't work on arrays, it works on numbers. Try this:
[isinf(i) ? NaN : i for i in a]
I'm trying to use mean(A,1) to get the mean row of a matrix A, but am getting an error.
For example, try running the command mean(eye(3), 1).
This gives the error no method mean(Array{Float64,2},Int32).
The only documentation I can find for the mean function is here:
http://docs.julialang.org/en/release-0.1/stdlib/base/#statistics
mean(v[, region])
Compute the mean of whole array v, or optionally along the dimensions in region.
What is the region parameter?
EDIT: for Julia 0.7 and higher, write this as mean(v, dims=1).
julia> using Statistics
julia> A = [[1 2 3];[ 4 5 6]]
2×3 Array{Int64,2}:
1 2 3
4 5 6
# Column means
julia> mean(A, dims=1)
1×3 Array{Float64,2}:
2.5 3.5 4.5
# Row means
julia> mean(A, dims=2)
2×1 Array{Float64,2}:
2.0
5.0
It must be something with your installation, mean(eye(3),1) works just fine here.