Is it possible to pre-allocate array for matrix factorization? - julia

My question is instead of F = svd(A), can one first allocate an appropriate memory for an SVD structure, and then do F .= svd(A) ?
What I had in mind is something like the following:
function main()
F = Vector{SVD}(undef,10)
# how to preallocate F?
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F .= svd(rand(3,3))
end
end

Your code almost works. But what you probably wanted was this:
using LinearAlgebra
function main()
F = Vector{SVD}(undef, 10)
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F[i] = svd(rand(3, 3))
end
return F
end
The line that you had in the for loop was this:
F .= svd(rand(3,3))
which does the same operation on every loop, since you were not indexing into F. In particular, this operation was trying to broadcast a single SVD object into all the fields of F on each iteration of the loop. (And that broadcast operation failed because by default structs are treated as iterable objects with a length method, but SVD does not have a length method.)
However, I would recommend against pre-allocating a vector in this situation. First, let's look at the type of F:
julia> typeof(Vector{SVD}(undef, 10))
Array{SVD,1}
The problem with this vector is that it is parameterized by an abstract type. There is a section in the Performance Tips chapter of the manual that advises against this. SVD is an abstract type because the types of its parameters have not been specified. To make it concrete, you need to specify the types of the parameters, like this:
julia> SVD{Float64,Float64,Array{Float64,2}}
SVD{Float64,Float64,Array{Float64,2}}
julia> Vector{SVD{Float64,Float64,Array{Float64,2}}}(undef, 2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
#undef
#undef
As you can see, it is difficult to correctly specify the concrete type when you are working with complicated types like SVD. Additionally, if you do so, your code will not be as generic as it could be.
A better approach for a problem like this is to use mapping, broadcasting, or a list comprehension. Then the correct output type will automatically be inferred. Here are some examples:
List comprehension
julia> [svd(rand(3, 3)) for _ in 1:2]
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.6357040496635746 -0.2941425771794837 -0.7136949667270628; -0.45459999623274916 -0.6045700314848496 0.654090147040599; -0.6238743500629883 0.7402534845042064 0.2506104028424691], [1.4535849689665463, 0.7212190827260345, 0.05010669163393896], [-0.5975505057447164 -0.588792736048385 -0.5442945039782142; 0.7619724725128861 -0.6283345569895092 -0.15682358121595258; -0.2496624605679292 -0.5084474392397449 0.8241054891903787])
SVD{Float64,Float64,Array{Float64,2}}([-0.5593632049776268 0.654338345992878 -0.5088753618327984; -0.6687620652652163 -0.7189576326033171 -0.18936003428293915; -0.4897653570633183 0.23439550227070827 0.8397551092645418], [1.8461274187259178, 0.21226179692488983, 0.14194607536315287], [-0.29089551972856004 -0.7086270946133293 -0.6428276887173754; -0.9203610429640889 0.023709029028269546 0.390350397126212; 0.2613720474647311 -0.7051847436823973 0.6590896221923739])
Map
julia> map(_ -> svd(rand(3, 3)), 1:2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.5807809149601634 0.5635242755434755 0.5874809951745127; -0.6884131975465821 0.0451903888051729 -0.7239095925620322; -0.43448912329507794 -0.8248625459025509 0.3616918330643316], [1.488618654040125, 0.4122166626927311, 0.004235624485479941], [-0.6721098925787947 -0.2684664121709399 -0.6900681689759235; -0.7384292974335966 0.31185073633575333 0.5978890289498324; -0.05468514413847799 -0.9114136842196914 0.4078414290231468])
SVD{Float64,Float64,Array{Float64,2}}([-0.3677873424759118 0.8090638526628051 -0.4584191892023337; -0.43071684640222546 -0.5851169278783189 -0.6871107472129654; -0.8241452960126802 -0.055261768200600137 0.5636760310989947], [1.6862363968739773, 0.5899255050748418, 0.24246688716190598], [-0.3751742784957875 -0.7172409091515735 -0.5872050229643736; 0.8600668700980193 -0.505618838823938 0.06807766730822862; -0.3457300098559026 -0.4794945964927631 0.8065703268899])
Broadcasting
julia> g = (rand(3, 3) for _ in 1:2)
Base.Generator{UnitRange{Int64},var"#17#18"}(var"#17#18"(), 1:2)
julia> svd.(g)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.7988295268840152 0.5443221484534134 -0.256095266807727; -0.5436890668169485 -0.8354777569473182 -0.0798693700362902; -0.257436566171119 0.07543418554831638 0.963346302244777], [1.8188722412547844, 0.3934389096422389, 0.2020398396772306], [-0.7147404794808727 -0.37763644211761316 -0.5886737335538281; -0.6944558966482991 0.4830041206449164 0.5333273169925189; -0.08292800854873916 -0.7899985677359054 0.607474450798845])
SVD{Float64,Float64,Array{Float64,2}}([-0.5910620103531503 0.3599866268397522 0.7218416228050514; -0.7367495542691711 0.12340124384185132 -0.664809918173956; -0.3283988340440176 -0.9247603805931685 0.1922821996018057], [1.826019614357666, 0.5333148215847028, 0.11639139812894106], [-0.6415954756495915 -0.6888196183142843 -0.33746522643279503; -0.5845558664639438 0.7239484700883465 -0.3663236978948133; -0.4966383841474222 0.037764349353666515 0.8671356118331964])
Furthermore, mapping, broadcasting, and list comprehensions should be just as efficient as pre-allocating the vector. If you're doing a simple mapping, then it's usually easier and more readable to use mapping, broadcasting, or list comprehensions. Pre-allocating vectors is a tool I reserve for writing custom algorithms from scratch.
A final note. In most cases, type parameters are considered an implementation detail and are not a part of the public API for a type. As such, it's best to use generic programming approaches that do not rely on fixing the types for type parameters. Of course there are some exceptions to this rule of thumb, like Array{T,N} and Dict{K,V}.

There's a differnent way of preallocation -- you can reuse the input array by always overwriting it, with both the rand call and svd's internal needs:
function test!(F::Vector{SVD})
A = Matrix{Float64}(undef, 3, 3)
for i in 1:10
rand!(A)
F[i] = svd!(A)
end
end
Cameron's advice still holds. I'd probably use something like
function test()
A = Matrix{Float64}(undef, 3, 3)
return map(1:10) do i
svd!(rand!(A))
end
end
given that the number of loops seems not be the critical part.

Related

Get names of keywords for Julia methods

I have a function like
function f(a = 1; first = 5, second = "asdf")
return a
end
Is there any way to programatically return a vector with the names of the keyword arguments. Something like:
kwargs(f)
# returns [:first, :second]
I realise that this might be complicated by having multiple methods for a functionname. But I was hoping this would still be possible if the exact method is specified. For instance:
kwargs(methods(f).ms[1])
# returns [:first, :second]
Just use Base.kwarg_decl()
julia> Base.kwarg_decl.(methods(f))
2-element Vector{Vector{Symbol}}:
[]
[:first, :second]
If you need the first parameter a as well you could also try:
julia> Base.method_argnames.(methods(f))
2-element Vector{Vector{Symbol}}:
[Symbol("#self#")]
[Symbol("#self#"), :a]

How to dispatch based on the type of any of the splatted args?

Consider an existing function in Base, which takes in a variable number of arguments of some abstract type T. I have defined a subtype S<:T and would like to write a method which dispatches if any of the arguments is my subtype S.
As an example, consider function Base.cat, with T being an AbstractArray and S being some MyCustomArray <: AbstractArray.
Desired behaviour:
julia> v = [1, 2, 3];
julia> cat(v, v, v, dims=2)
3×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
julia> w = MyCustomArray([1,2,3])
julia> cat(v, v, w, dims=2)
"do something fancy"
Attempt:
function Base.cat(w::MyCustomArray, a::AbstractArray...; dims)
pritnln("do something fancy")
end
But this only works if the first argument is MyCustomArray.
What is an elegant way of achieving this?
I would say that it is not possible to do it cleanly without type piracy (but if it is possible I would also like to learn how).
For example consider cat that you asked about. It has one very general signature in Base (actually not requiring A to be AbstractArray as you write):
julia> methods(cat)
# 1 method for generic function "cat":
[1] cat(A...; dims) in Base at abstractarray.jl:1654
You could write a specific method:
Base.cat(A::AbstractArray...; dims) = ...
and check if any of elements of A is your special array, but this would be type piracy.
Now the problem is that you cannot even write Union{S, T} as since S <: T it will be resolved as just T.
This would mean that you would have to use S explicitly in the signature, but then even:
f(::S, ::T) = ...
f(::T, ::S) = ...
is problematic and a compiler will ask you to define f(::S, ::S) as the above definitions lead to dispatch ambiguity. So, even if you wanted to limit the number of varargs to some maximum number you would have to annotate types for all divisions of A into subsets to avoid dispatch ambiguity (which is doable using macros, but grows the number of required methods exponentially).
For general usage, I concur with Bogumił, but let me make an additional comment. If you have control over how cat is called, you can at least write some kind of trait-dispatch code:
struct MyCustomArray{T, N} <: AbstractArray{T, N}
x::Array{T, N}
end
HasCustom() = Val(false)
HasCustom(::MyCustomArray, rest...) = Val(true)
HasCustom(::AbstractArray, rest...) = HasCustom(rest...)
# `IsCustom` or something would be more elegant, but `Val` is quicker for now
Base.cat(::Val{true}, args...; dims) = println("something fancy")
Base.cat(::Val{false}, args...; dims) = cat(args...; dims=dims)
And the compiler is cool enough to optimize that away:
julia> args = (v, v, w);
julia> #code_warntype cat(HasCustom(args...), args...; dims=2);
Variables
#self#::Core.Compiler.Const(cat, false)
#unused#::Core.Compiler.Const(Val{true}(), false)
args::Tuple{Array{Int64,1},Array{Int64,1},MyCustomArray{Int64,1}}
Body::Nothing
1 ─ %1 = Main.println("something fancy")::Core.Compiler.Const(nothing, false)
└── return %1
If you don't have control over calls to cat, the only resort I can think of to make the above technique work is to overdub methods containing such call, to replace matching calls by the custom implementation. In which case you don't even need to overload cat, but can directly replace it by some mycat doing your fancy stuff.

Evaluate expression with local variables

I'm writing a genetic program in order to test the fitness of randomly generated expressions. Shown here is the function to generate the expression as well a the main function. DIV and GT are defined elsewhere in the code:
function create_single_full_tree(depth, fs, ts)
"""
Creates a single AST with full depth
Inputs
depth Current depth of tree. Initially called from main() with max depth
fs Function Set - Array of allowed functions
ts Terminal Set - Array of allowed terminal values
Output
Full AST of typeof()==Expr
"""
# If we are at the bottom
if depth == 1
# End of tree, return function with two terminal nodes
return Expr(:call, fs[rand(1:length(fs))], ts[rand(1:length(ts))], ts[rand(1:length(ts))])
else
# Not end of expression, recurively go back through and create functions for each new node
return Expr(:call, fs[rand(1:length(fs))], create_single_full_tree(depth-1, fs, ts), create_single_full_tree(depth-1, fs, ts))
end
end
function main()
"""
Main function
"""
# Define functional and terminal sets
fs = [:+, :-, :DIV, :GT]
ts = [:x, :v, -1]
# Create the tree
ast = create_single_full_tree(4, fs, ts)
#println(typeof(ast))
#println(ast)
#println(dump(ast))
x = 1
v = 1
eval(ast) # Error out unless x and v are globals
end
main()
I am generating a random expression based on certain allowed functions and variables. As seen in the code, the expression can only have symbols x and v, as well as the value -1. I will need to test the expression with a variety of x and v values; here I am just using x=1 and v=1 to test the code.
The expression is being returned correctly, however, eval() can only be used with global variables, so it will error out when run unless I declare x and v to be global (ERROR: LoadError: UndefVarError: x not defined). I would like to avoid globals if possible. Is there a better way to generate and evaluate these generated expressions with locally defined variables?
Here is an example for generating an (anonymous) function. The result of eval can be called as a function and your variable can be passed as parameters:
myfun = eval(Expr(:->,:x, Expr(:block, Expr(:call,:*,3,:x) )))
myfun(14)
# returns 42
The dump function is very useful to inspect the expression that the parsers has created. For two input arguments you would use a tuple for example as args[1]:
julia> dump(parse("(x,y) -> 3x + y"))
Expr
head: Symbol ->
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Symbol x
2: Symbol y
typ: Any
2: Expr
[...]
Does this help?
In the Metaprogramming part of the Julia documentation, there is a sentence under the eval() and effects section which says
Every module has its own eval() function that evaluates expressions in its global scope.
Similarly, the REPL help ?eval will give you, on Julia 0.6.2, the following help:
Evaluate an expression in the given module and return the result. Every Module (except those defined with baremodule) has its own 1-argument definition of eval, which evaluates expressions in that module.
I assume, you are working in the Main module in your example. That's why you need to have the globals defined there. For your problem, you can use macros and interpolate the values of x and y directly inside the macro.
A minimal working example would be:
macro eval_line(a, b, x)
isa(a, Real) || (warn("$a is not a real number."); return :(throw(DomainError())))
isa(b, Real) || (warn("$b is not a real number."); return :(throw(DomainError())))
return :($a * $x + $b) # interpolate the variables
end
Here, #eval_line macro does the following:
Main> #macroexpand #eval_line(5, 6, 2)
:(5 * 2 + 6)
As you can see, the values of macro's arguments are interpolated inside the macro and the expression is given to the user accordingly. When the user does not behave,
Main> #macroexpand #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
:((Main.throw)((Main.DomainError)()))
a user-friendly warning message is provided to the user at parse-time, and a DomainError is thrown at run-time.
Of course, you can do these things within your functions, again by interpolating the variables --- you do not need to use macros. However, what you would like to achieve in the end is to combine eval with the output of a function that returns Expr. This is what the macro functionality is for. Finally, you would simply call your macros with an # sign preceding the macro name:
Main> #eval_line(5, 6, 2)
16
Main> #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
ERROR: DomainError:
Stacktrace:
[1] eval(::Module, ::Any) at ./boot.jl:235
EDIT 1. You can take this one step further, and create functions accordingly:
macro define_lines(linedefs)
for (name, a, b) in eval(linedefs)
ex = quote
function $(Symbol(name))(x) # interpolate name
return $a * x + $b # interpolate a and b here
end
end
eval(ex) # evaluate the function definition expression in the module
end
end
Then, you can call this macro to create different line definitions in the form of functions to be called later on:
#define_lines([
("identity_line", 1, 0);
("null_line", 0, 0);
("unit_shift", 0, 1)
])
identity_line(5) # returns 5
null_line(5) # returns 0
unit_shift(5) # returns 1
EDIT 2. You can, I guess, achieve what you would like to achieve by using a macro similar to that below:
macro random_oper(depth, fs, ts)
operations = eval(fs)
oper = operations[rand(1:length(operations))]
terminals = eval(ts)
ts = terminals[rand(1:length(terminals), 2)]
ex = :($oper($ts...))
for d in 2:depth
oper = operations[rand(1:length(operations))]
t = terminals[rand(1:length(terminals))]
ex = :($oper($ex, $t))
end
return ex
end
which will give the following, for instance:
Main> #macroexpand #random_oper(1, [+, -, /], [1,2,3])
:((-)([3, 3]...))
Main> #macroexpand #random_oper(2, [+, -, /], [1,2,3])
:((+)((-)([2, 3]...), 3))
Thanks Arda for the thorough response! This helped, but part of me thinks there may be a better way to do this as it seems too roundabout. Since I am writing a genetic program, I will need to create 500 of these ASTs, all with random functions and terminals from a set of allowed functions and terminals (fs and ts in the code). I will also need to test each function with 20 different values of x and v.
In order to accomplish this with the information you have given, I have come up with the following macro:
macro create_function(defs)
for name in eval(defs)
ex = quote
function $(Symbol(name))(x,v)
fs = [:+, :-, :DIV, :GT]
ts = [x,v,-1]
return create_single_full_tree(4, fs, ts)
end
end
eval(ex)
end
end
I can then supply a list of 500 random function names in my main() function, such as ["func1, func2, func3,.....". Which I can eval with any x and v values in my main function. This has solved my issue, however, this seems to be a very roundabout way of doing this, and may make it difficult to evolve each AST with each iteration.

Check if a function has keywords arguments in Julia

Is there a way to check if a function has keywords arguments in Julia? I am looking for something like has_kwargs(fun::Function) that would return true if fun has a method with keyword arguments.
The high level idea is to build a function:
function master_fun(foo::Any, fun::Function, ar::Tuple, kw::Tuple)
if has_kwargs(fun)
fun(ar... ; kw...)
else
fun(ar...)
end
end
Basically, #Michael K. Borregaard's suggestion to use try-catch is correct and officially works.
Looking into the unofficial implementation details, I came up with the followng:
haskw(f,tup) = isdefined(typeof(f).name.mt,:kwsorter) &&
length(methods(typeof(f).name.mt.kwsorter,(Vector{Any},typeof(f),tup...)))>0
This function first looks if there is any keyword processing on any method of the generic function, and if so, looks at the specific tuple of types.
For example:
julia> f(x::Int) = 1
f (generic function with 1 method)
julia> f(x::String ; y="value") = 2
f (generic function with 2 methods)
julia> haskw(f,(Int,))
false
julia> haskw(f,(String,))
true
This should be tested for the specific application, as it probably doesn't work when non-leaf types are involved. As Michael commented, in the question's context the statement would be:
if haskw(fun, typeof.(ar))
...
I don't think you can guarantee that a given function has keyword arguments. Check
f(;x = 3) = println(x)
f(x) = println(2x)
f(3)
#6
f(x = 3)
#3
f(3, x = 3)
#ERROR: MethodError: no method matching f(::Int64; x=3)
#Closest candidates are:
# f(::Any) at REPL[2]:1 got unsupported keyword argument "x"
# f(; x) at REPL[1]:1
So, does the f function have keywords? You can only check for a given method. Note that, in your example above, you'd normally just do
function master_fun(foo, fun::Function, ar::Tuple, kw....)
fun(ar... ; kw...)
end
which should work, and if keywords are passed to a function that does not take them you'd just leave the error reporting to fun. If that is not acceptable you could try to wrap the fun(ar...; kw...) in a try-catch block.

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?
Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

Resources