Copy or clone a collection in Julia - collections

I have created a one-dimensional array(vector) in Julia, namely, a=[1, 2, 3, 4, 5]. Then I want to create a new vector b, where b has exactly same elements in a, i.e b=[1, 2, 3, 4, 5].
It seems that directly use b = a just create a pointer for the original collection, which means if I modify b and a is mutable, the modification will also be reflected in a. For example, if I use !pop(b), then b=[1, 2, 3, 4] and a=[1, 2, 3, 4].
I am wondering if there is a official function to merely copy or clone the collection, which the change in b will not happen in a. I find a solution is use b = collect(a). I would appreciate that someone provide some other approaches.

b=copy(a)
Should do what you want.
methods(copy) will give you a list of methods for copy, which will tell you what types of a this will work for.
julia> methods(copy)
# 32 methods for generic function "copy":
copy(r::Range{T}) at range.jl:324
copy(e::Expr) at expr.jl:34
copy(s::SymbolNode) at expr.jl:38
copy(x::Union{AbstractString,DataType,Function,LambdaStaticData,Number,QuoteNode,Symbol,TopNode,Tuple,Union}) at operators.jl:194
copy(V::SubArray{T,N,P<:AbstractArray{T,N},I<:Tuple{Vararg{Union{AbstractArray{T,1},Colon,Int64}}},LD}) at subarray.jl:29
copy(a::Array{T,N}) at array.jl:100
copy(M::SymTridiagonal{T}) at linalg/tridiag.jl:63
copy(M::Tridiagonal{T}) at linalg/tridiag.jl:320
copy{T,S}(A::LowerTriangular{T,S}) at linalg/triangular.jl:36
copy{T,S}(A::Base.LinAlg.UnitLowerTriangular{T,S}) at linalg/triangular.jl:36
copy{T,S}(A::UpperTriangular{T,S}) at linalg/triangular.jl:36
copy{T,S}(A::Base.LinAlg.UnitUpperTriangular{T,S}) at linalg/triangular.jl:36
copy{T,S}(A::Symmetric{T,S}) at linalg/symmetric.jl:38
copy{T,S}(A::Hermitian{T,S}) at linalg/symmetric.jl:39
copy(M::Bidiagonal{T}) at linalg/bidiag.jl:113
copy(S::SparseMatrixCSC{Tv,Ti<:Integer}) at sparse/sparsematrix.jl:184
copy{Tv<:Float64}(A::Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Float64}, stype::Integer, mode::Integer) at sparse/cholmod.jl:583
copy(A::Base.SparseMatrix.CHOLMOD.Dense{T<:Union{Complex{Float64},Float64}}) at sparse/cholmod.jl:1068
copy(A::Base.SparseMatrix.CHOLMOD.Sparse{Tv<:Union{Complex{Float64},Float64}}) at sparse/cholmod.jl:1069
copy(a::AbstractArray{T,N}) at abstractarray.jl:349
copy(s::IntSet) at intset.jl:34
copy(o::ObjectIdDict) at dict.jl:358
copy(d::Dict{K,V}) at dict.jl:414
copy(a::Associative{K,V}) at dict.jl:204
copy(s::Set{T}) at set.jl:35
copy(b::Base.AbstractIOBuffer{T<:AbstractArray{UInt8,1}}) at iobuffer.jl:38
copy(r::Regex) at regex.jl:65
copy(::Base.DevNullStream) at process.jl:98
copy(C::Base.LinAlg.Cholesky{T,S<:AbstractArray{T,2}}) at linalg/cholesky.jl:160
copy(C::Base.LinAlg.CholeskyPivoted{T,S<:AbstractArray{T,2}}) at linalg/cholesky.jl:161
copy(J::UniformScaling{T<:Number}) at linalg/uniformscaling.jl:17
copy(A::Base.SparseMatrix.CHOLMOD.Factor{Tv}) at sparse/cholmod.jl:1070

You can use the copy and deepcopy functions:
help?> copy
search: copy copy! copysign deepcopy unsafe_copy! cospi complex Complex complex64 complex32 complex128 complement
copy(x)
Create a shallow copy of x: the outer structure is copied, but not all internal values. For example, copying an
array produces a new array with identically-same elements as the original.
help?> deepcopy
search: deepcopy
deepcopy(x)
Create a deep copy of x: everything is copied recursively, resulting in a fully independent object. For example,
deep-copying an array produces a new array whose elements are deep copies of the original elements. Calling deepcopy
on an object should generally have the same effect as serializing and then deserializing it.
As a special case, functions can only be actually deep-copied if they are anonymous, otherwise they are just copied.
The difference is only relevant in the case of closures, i.e. functions which may contain hidden internal
references.
While it isn't normally necessary, user-defined types can override the default deepcopy behavior by defining a
specialized version of the function deepcopy_internal(x::T, dict::ObjectIdDict) (which shouldn't otherwise be used),
where T is the type to be specialized for, and dict keeps track of objects copied so far within the recursion.
Within the definition, deepcopy_internal should be used in place of deepcopy, and the dict variable should be
updated as appropriate before returning.
Like this:
julia> a = Any[1, 2, 3, [4, 5, 6]]
4-element Array{Any,1}:
1
2
3
[4,5,6]
julia> b = copy(a); c = deepcopy(a);
julia> a[4][1] = 42;
julia> b # copied
4-element Array{Any,1}:
1
2
3
[42,5,6]
julia> c # deep copied
4-element Array{Any,1}:
1
2
3
[4,5,6]
Notice that the help system hints of the existence of other copy related functions.

Related

Why (; [(:x, 1), (:y, 2)]...) creates a NamedTuple?

I'm still learning Julia, and I recently came across the following code excerpt that flummoxed me:
res = (; [(:x, 10), (:y, 20)]...) # why the semicolon in front?
println(res) # (x = 10, y = 20)
println(typeof(res)) # NamedTuple{(:x, :y), Tuple{Int64, Int64}}
I understand the "splat" operator ..., but what happens when the semicolon appear first in a tuple? In other words, how does putting a semicolon in (; [(:x, 10), (:y, 20)]...) create a NamedTuple? Is this some undocumented feature/trick?
Thanks for any pointers.
Yes, this is actually a documented feature, but perhaps not a very well known one. As the documentation for NamedTuple notes:
help?> NamedTuple
search: NamedTuple #NamedTuple
NamedTuple
NamedTuples are, as their name suggests, named Tuples. That is, they're a tuple-like
collection of values, where each entry has a unique name, represented as a Symbol.
Like Tuples, NamedTuples are immutable; neither the names nor the values can be
modified in place after construction.
Accessing the value associated with a name in a named tuple can be done using field
access syntax, e.g. x.a, or using getindex, e.g. x[:a]. A tuple of the names can be
obtained using keys, and a tuple of the values can be obtained using values.
[... some other non-relevant parts of the documentation omitted ...]
In a similar fashion as to how one can define keyword arguments programmatically, a
named tuple can be created by giving a pair name::Symbol => value or splatting an
iterator yielding such pairs after a semicolon inside a tuple literal:
julia> (; :a => 1)
(a = 1,)
julia> keys = (:a, :b, :c); values = (1, 2, 3);
julia> (; zip(keys, values)...)
(a = 1, b = 2, c = 3)
As in keyword arguments, identifiers and dot expressions imply names:
julia> x = 0
0
julia> t = (; x)
(x = 0,)
julia> (; t.x)
(x = 0,)

Is it possible to pre-allocate array for matrix factorization?

My question is instead of F = svd(A), can one first allocate an appropriate memory for an SVD structure, and then do F .= svd(A) ?
What I had in mind is something like the following:
function main()
F = Vector{SVD}(undef,10)
# how to preallocate F?
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F .= svd(rand(3,3))
end
end
Your code almost works. But what you probably wanted was this:
using LinearAlgebra
function main()
F = Vector{SVD}(undef, 10)
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F[i] = svd(rand(3, 3))
end
return F
end
The line that you had in the for loop was this:
F .= svd(rand(3,3))
which does the same operation on every loop, since you were not indexing into F. In particular, this operation was trying to broadcast a single SVD object into all the fields of F on each iteration of the loop. (And that broadcast operation failed because by default structs are treated as iterable objects with a length method, but SVD does not have a length method.)
However, I would recommend against pre-allocating a vector in this situation. First, let's look at the type of F:
julia> typeof(Vector{SVD}(undef, 10))
Array{SVD,1}
The problem with this vector is that it is parameterized by an abstract type. There is a section in the Performance Tips chapter of the manual that advises against this. SVD is an abstract type because the types of its parameters have not been specified. To make it concrete, you need to specify the types of the parameters, like this:
julia> SVD{Float64,Float64,Array{Float64,2}}
SVD{Float64,Float64,Array{Float64,2}}
julia> Vector{SVD{Float64,Float64,Array{Float64,2}}}(undef, 2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
#undef
#undef
As you can see, it is difficult to correctly specify the concrete type when you are working with complicated types like SVD. Additionally, if you do so, your code will not be as generic as it could be.
A better approach for a problem like this is to use mapping, broadcasting, or a list comprehension. Then the correct output type will automatically be inferred. Here are some examples:
List comprehension
julia> [svd(rand(3, 3)) for _ in 1:2]
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.6357040496635746 -0.2941425771794837 -0.7136949667270628; -0.45459999623274916 -0.6045700314848496 0.654090147040599; -0.6238743500629883 0.7402534845042064 0.2506104028424691], [1.4535849689665463, 0.7212190827260345, 0.05010669163393896], [-0.5975505057447164 -0.588792736048385 -0.5442945039782142; 0.7619724725128861 -0.6283345569895092 -0.15682358121595258; -0.2496624605679292 -0.5084474392397449 0.8241054891903787])
SVD{Float64,Float64,Array{Float64,2}}([-0.5593632049776268 0.654338345992878 -0.5088753618327984; -0.6687620652652163 -0.7189576326033171 -0.18936003428293915; -0.4897653570633183 0.23439550227070827 0.8397551092645418], [1.8461274187259178, 0.21226179692488983, 0.14194607536315287], [-0.29089551972856004 -0.7086270946133293 -0.6428276887173754; -0.9203610429640889 0.023709029028269546 0.390350397126212; 0.2613720474647311 -0.7051847436823973 0.6590896221923739])
Map
julia> map(_ -> svd(rand(3, 3)), 1:2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.5807809149601634 0.5635242755434755 0.5874809951745127; -0.6884131975465821 0.0451903888051729 -0.7239095925620322; -0.43448912329507794 -0.8248625459025509 0.3616918330643316], [1.488618654040125, 0.4122166626927311, 0.004235624485479941], [-0.6721098925787947 -0.2684664121709399 -0.6900681689759235; -0.7384292974335966 0.31185073633575333 0.5978890289498324; -0.05468514413847799 -0.9114136842196914 0.4078414290231468])
SVD{Float64,Float64,Array{Float64,2}}([-0.3677873424759118 0.8090638526628051 -0.4584191892023337; -0.43071684640222546 -0.5851169278783189 -0.6871107472129654; -0.8241452960126802 -0.055261768200600137 0.5636760310989947], [1.6862363968739773, 0.5899255050748418, 0.24246688716190598], [-0.3751742784957875 -0.7172409091515735 -0.5872050229643736; 0.8600668700980193 -0.505618838823938 0.06807766730822862; -0.3457300098559026 -0.4794945964927631 0.8065703268899])
Broadcasting
julia> g = (rand(3, 3) for _ in 1:2)
Base.Generator{UnitRange{Int64},var"#17#18"}(var"#17#18"(), 1:2)
julia> svd.(g)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.7988295268840152 0.5443221484534134 -0.256095266807727; -0.5436890668169485 -0.8354777569473182 -0.0798693700362902; -0.257436566171119 0.07543418554831638 0.963346302244777], [1.8188722412547844, 0.3934389096422389, 0.2020398396772306], [-0.7147404794808727 -0.37763644211761316 -0.5886737335538281; -0.6944558966482991 0.4830041206449164 0.5333273169925189; -0.08292800854873916 -0.7899985677359054 0.607474450798845])
SVD{Float64,Float64,Array{Float64,2}}([-0.5910620103531503 0.3599866268397522 0.7218416228050514; -0.7367495542691711 0.12340124384185132 -0.664809918173956; -0.3283988340440176 -0.9247603805931685 0.1922821996018057], [1.826019614357666, 0.5333148215847028, 0.11639139812894106], [-0.6415954756495915 -0.6888196183142843 -0.33746522643279503; -0.5845558664639438 0.7239484700883465 -0.3663236978948133; -0.4966383841474222 0.037764349353666515 0.8671356118331964])
Furthermore, mapping, broadcasting, and list comprehensions should be just as efficient as pre-allocating the vector. If you're doing a simple mapping, then it's usually easier and more readable to use mapping, broadcasting, or list comprehensions. Pre-allocating vectors is a tool I reserve for writing custom algorithms from scratch.
A final note. In most cases, type parameters are considered an implementation detail and are not a part of the public API for a type. As such, it's best to use generic programming approaches that do not rely on fixing the types for type parameters. Of course there are some exceptions to this rule of thumb, like Array{T,N} and Dict{K,V}.
There's a differnent way of preallocation -- you can reuse the input array by always overwriting it, with both the rand call and svd's internal needs:
function test!(F::Vector{SVD})
A = Matrix{Float64}(undef, 3, 3)
for i in 1:10
rand!(A)
F[i] = svd!(A)
end
end
Cameron's advice still holds. I'd probably use something like
function test()
A = Matrix{Float64}(undef, 3, 3)
return map(1:10) do i
svd!(rand!(A))
end
end
given that the number of loops seems not be the critical part.

How to dispatch based on the type of any of the splatted args?

Consider an existing function in Base, which takes in a variable number of arguments of some abstract type T. I have defined a subtype S<:T and would like to write a method which dispatches if any of the arguments is my subtype S.
As an example, consider function Base.cat, with T being an AbstractArray and S being some MyCustomArray <: AbstractArray.
Desired behaviour:
julia> v = [1, 2, 3];
julia> cat(v, v, v, dims=2)
3×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
julia> w = MyCustomArray([1,2,3])
julia> cat(v, v, w, dims=2)
"do something fancy"
Attempt:
function Base.cat(w::MyCustomArray, a::AbstractArray...; dims)
pritnln("do something fancy")
end
But this only works if the first argument is MyCustomArray.
What is an elegant way of achieving this?
I would say that it is not possible to do it cleanly without type piracy (but if it is possible I would also like to learn how).
For example consider cat that you asked about. It has one very general signature in Base (actually not requiring A to be AbstractArray as you write):
julia> methods(cat)
# 1 method for generic function "cat":
[1] cat(A...; dims) in Base at abstractarray.jl:1654
You could write a specific method:
Base.cat(A::AbstractArray...; dims) = ...
and check if any of elements of A is your special array, but this would be type piracy.
Now the problem is that you cannot even write Union{S, T} as since S <: T it will be resolved as just T.
This would mean that you would have to use S explicitly in the signature, but then even:
f(::S, ::T) = ...
f(::T, ::S) = ...
is problematic and a compiler will ask you to define f(::S, ::S) as the above definitions lead to dispatch ambiguity. So, even if you wanted to limit the number of varargs to some maximum number you would have to annotate types for all divisions of A into subsets to avoid dispatch ambiguity (which is doable using macros, but grows the number of required methods exponentially).
For general usage, I concur with Bogumił, but let me make an additional comment. If you have control over how cat is called, you can at least write some kind of trait-dispatch code:
struct MyCustomArray{T, N} <: AbstractArray{T, N}
x::Array{T, N}
end
HasCustom() = Val(false)
HasCustom(::MyCustomArray, rest...) = Val(true)
HasCustom(::AbstractArray, rest...) = HasCustom(rest...)
# `IsCustom` or something would be more elegant, but `Val` is quicker for now
Base.cat(::Val{true}, args...; dims) = println("something fancy")
Base.cat(::Val{false}, args...; dims) = cat(args...; dims=dims)
And the compiler is cool enough to optimize that away:
julia> args = (v, v, w);
julia> #code_warntype cat(HasCustom(args...), args...; dims=2);
Variables
#self#::Core.Compiler.Const(cat, false)
#unused#::Core.Compiler.Const(Val{true}(), false)
args::Tuple{Array{Int64,1},Array{Int64,1},MyCustomArray{Int64,1}}
Body::Nothing
1 ─ %1 = Main.println("something fancy")::Core.Compiler.Const(nothing, false)
└── return %1
If you don't have control over calls to cat, the only resort I can think of to make the above technique work is to overdub methods containing such call, to replace matching calls by the custom implementation. In which case you don't even need to overload cat, but can directly replace it by some mycat doing your fancy stuff.

custom ordering in Julia SortedSet

In the Julia documentation for SortedSet, there is a reference to "ordering objects", which can be used in the constructor. I'm working on a project where I need to implement a custom sort on a set of structs. I'd like to use a functor for this, since there is additional state I need for my comparisons.
Here is a somewhat simplified version of the problem I want to solve. I have two structs, Point and Edge:
struct Point{T<:Real}
x::T
y::T
end
struct Edge{T<:Real}
first::Point{T}
second::Point{T}
end
I have a Point called 'vantage', and I want to order Edges by their distance from 'vantage'. Conceptually:
function edge_ordering(vantage::Point, e1::Edge, e2::Edge)
d1 = distance(vantage, e1)
d2 = distance(vantage, e2)
return d1 < d2
end
Are "ordering objects" functors (or functor-ish)? Is there some other conventional way of doing this sort of ordering in Julia?
An Ordering object can contain fields, you can store your state there. This is an example of a Remainder Ordering which sort integers by it's remainder:
using DataStructures
struct RemainderOrdering <: Base.Order.Ordering
r::Int
end
import Base.Order.lt
lt(o::RemainderOrdering, a, b) = isless(a % o.r, b % o.r)
SortedSet(RemainderOrdering(3), [1,2,3]) # 3, 1, 2
I'm not sure how it is related to functors, so I may misunderstand your question. This is an alternative implementation that defines an Ordering functor. I made explanations in comments.
using DataStructures
import Base: isless, map
struct Foo # this is your structure
x::Int
end
struct PrimaryOrdered{T, F} # this is the functor, F is the additional state.
x::T
end
map(f::Base.Callable, x::T) where {T <: PrimaryOrdered} = T(f(x.x)) # this makes it a functor?
isless(x::PrimaryOrdered{T, F}, y::PrimaryOrdered{T, F}) where {T, F} =
F(x.x) < F(y.x) # do comparison with your additional state, here I assume it is a closure
const OrderR3 = PrimaryOrdered{Foo, x -> x.x % 3} # a order that order by the remainder by 3
a = OrderR3(Foo(2))
f(x::Foo) = Foo(x.x + 1) # this is a Foo -> Foo
a = map(f, a) # you can map f on a OrderR3 object
a == OrderR3(Foo(33)) # true
a = map(OrderR3 ∘ Foo, [1, 2, 3])
s = SortedSet(a)
map(x->x.x, s) # Foo[3, 1, 2]
As always, an MWE is important for a question to be understood better. You can include a piece of code to show how you want to construct and use your SortedSet, instead of the vague "state" and "functor".
The sorting is based on the method isless for the type. So for instance if you have a type in which you want to sort on the b field. For instance you can do
struct Foo{T}
a::T
b::T
end
Base.:isless(x::T,y::T) where {T<:Foo} = isless(x.b,y.b)
s=[Foo(1,2),Foo(2,-1)]
res=SortedSet(s)
#SortedSet(Foo[Foo(2, -1), Foo(1, 2)],
#Base.Order.ForwardOrdering())
Tuples are also sorted in order, so you can also use
sort(s,by=x->(x.b,x.a)) to sort by b,thena without having to define isless for the type.

Transform nested array into new dimension

Given an array as follows:
A = Array{Array{Int}}(2,2)
A[1,1] = [1,2]
A[1,2] = [3,4]
A[2,1] = [5,6]
A[2,2] = [7,8]
We then have that A is a 2x2 array with elements of type Array{Int}:
2×2 Array{Array{Int64,N} where N,2}:
[1, 2] [3, 4]
[5, 6] [7, 8]
It is possible to access the entries with e.g. A[1,2] but A[1,2,2] would not work since the third dimension is not present in A. However, A[1,2][2] works, since A[1,2] returns an array of length 2.
The question is then, what is a nice way to convert A into a 3-dimensional array, B, so that B[i,j,k] refers the the i,j-th array and the k-th element in that array. E.g. B[2,1,2] = 6.
There is a straightforward way to do this using 3 nested loops and reconstructing the array, element-by-element, but I'm hoping there is a nicer construction. (Some application of cat perhaps?)
You can construct a 3-d array from A using an array comprehension
julia> B = [ A[i,j][k] for i=1:2, j=:1:2, k=1:2 ]
2×2×2 Array{Int64,3}:
[:, :, 1] =
1 3
5 7
[:, :, 2] =
2 4
6 8
julia> B[2,1,2]
6
However a more general solution would be to overload the getindex function for arrays with the same type of A. This is more efficient since there is no need to copy the original data.
julia> import Base.getindex
julia> getindex(A::Array{Array{Int}}, i::Int, j::Int, k::Int) = A[i,j][k]
getindex (generic function with 179 methods)
julia> A[2,1,2]
6
With thanks to Dan Getz's comments, I think the following works well and is succinct:
cat(3,(getindex.(A,i) for i=1:2)...)
where 2 is the length of the nested array. It would also work for higher dimensions.
permutedims(reshape(collect(Base.Iterators.flatten(A)), (2,2,2)), (2,3,1))
also does the job and appears to be faster than the accepted cat() answer for me.
EDIT: I'm sorry, I just saw that this has already been suggested in the comments.

Resources