Convert Dict to DataFrame in Julia - dictionary

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?

Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

Related

Why (; [(:x, 1), (:y, 2)]...) creates a NamedTuple?

I'm still learning Julia, and I recently came across the following code excerpt that flummoxed me:
res = (; [(:x, 10), (:y, 20)]...) # why the semicolon in front?
println(res) # (x = 10, y = 20)
println(typeof(res)) # NamedTuple{(:x, :y), Tuple{Int64, Int64}}
I understand the "splat" operator ..., but what happens when the semicolon appear first in a tuple? In other words, how does putting a semicolon in (; [(:x, 10), (:y, 20)]...) create a NamedTuple? Is this some undocumented feature/trick?
Thanks for any pointers.
Yes, this is actually a documented feature, but perhaps not a very well known one. As the documentation for NamedTuple notes:
help?> NamedTuple
search: NamedTuple #NamedTuple
NamedTuple
NamedTuples are, as their name suggests, named Tuples. That is, they're a tuple-like
collection of values, where each entry has a unique name, represented as a Symbol.
Like Tuples, NamedTuples are immutable; neither the names nor the values can be
modified in place after construction.
Accessing the value associated with a name in a named tuple can be done using field
access syntax, e.g. x.a, or using getindex, e.g. x[:a]. A tuple of the names can be
obtained using keys, and a tuple of the values can be obtained using values.
[... some other non-relevant parts of the documentation omitted ...]
In a similar fashion as to how one can define keyword arguments programmatically, a
named tuple can be created by giving a pair name::Symbol => value or splatting an
iterator yielding such pairs after a semicolon inside a tuple literal:
julia> (; :a => 1)
(a = 1,)
julia> keys = (:a, :b, :c); values = (1, 2, 3);
julia> (; zip(keys, values)...)
(a = 1, b = 2, c = 3)
As in keyword arguments, identifiers and dot expressions imply names:
julia> x = 0
0
julia> t = (; x)
(x = 0,)
julia> (; t.x)
(x = 0,)

Get names of keywords for Julia methods

I have a function like
function f(a = 1; first = 5, second = "asdf")
return a
end
Is there any way to programatically return a vector with the names of the keyword arguments. Something like:
kwargs(f)
# returns [:first, :second]
I realise that this might be complicated by having multiple methods for a functionname. But I was hoping this would still be possible if the exact method is specified. For instance:
kwargs(methods(f).ms[1])
# returns [:first, :second]
Just use Base.kwarg_decl()
julia> Base.kwarg_decl.(methods(f))
2-element Vector{Vector{Symbol}}:
[]
[:first, :second]
If you need the first parameter a as well you could also try:
julia> Base.method_argnames.(methods(f))
2-element Vector{Vector{Symbol}}:
[Symbol("#self#")]
[Symbol("#self#"), :a]

Is it possible to pre-allocate array for matrix factorization?

My question is instead of F = svd(A), can one first allocate an appropriate memory for an SVD structure, and then do F .= svd(A) ?
What I had in mind is something like the following:
function main()
F = Vector{SVD}(undef,10)
# how to preallocate F?
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F .= svd(rand(3,3))
end
end
Your code almost works. But what you probably wanted was this:
using LinearAlgebra
function main()
F = Vector{SVD}(undef, 10)
test(F)
end
function test(F::Vector{SVD})
for i in 1:10
F[i] = svd(rand(3, 3))
end
return F
end
The line that you had in the for loop was this:
F .= svd(rand(3,3))
which does the same operation on every loop, since you were not indexing into F. In particular, this operation was trying to broadcast a single SVD object into all the fields of F on each iteration of the loop. (And that broadcast operation failed because by default structs are treated as iterable objects with a length method, but SVD does not have a length method.)
However, I would recommend against pre-allocating a vector in this situation. First, let's look at the type of F:
julia> typeof(Vector{SVD}(undef, 10))
Array{SVD,1}
The problem with this vector is that it is parameterized by an abstract type. There is a section in the Performance Tips chapter of the manual that advises against this. SVD is an abstract type because the types of its parameters have not been specified. To make it concrete, you need to specify the types of the parameters, like this:
julia> SVD{Float64,Float64,Array{Float64,2}}
SVD{Float64,Float64,Array{Float64,2}}
julia> Vector{SVD{Float64,Float64,Array{Float64,2}}}(undef, 2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
#undef
#undef
As you can see, it is difficult to correctly specify the concrete type when you are working with complicated types like SVD. Additionally, if you do so, your code will not be as generic as it could be.
A better approach for a problem like this is to use mapping, broadcasting, or a list comprehension. Then the correct output type will automatically be inferred. Here are some examples:
List comprehension
julia> [svd(rand(3, 3)) for _ in 1:2]
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.6357040496635746 -0.2941425771794837 -0.7136949667270628; -0.45459999623274916 -0.6045700314848496 0.654090147040599; -0.6238743500629883 0.7402534845042064 0.2506104028424691], [1.4535849689665463, 0.7212190827260345, 0.05010669163393896], [-0.5975505057447164 -0.588792736048385 -0.5442945039782142; 0.7619724725128861 -0.6283345569895092 -0.15682358121595258; -0.2496624605679292 -0.5084474392397449 0.8241054891903787])
SVD{Float64,Float64,Array{Float64,2}}([-0.5593632049776268 0.654338345992878 -0.5088753618327984; -0.6687620652652163 -0.7189576326033171 -0.18936003428293915; -0.4897653570633183 0.23439550227070827 0.8397551092645418], [1.8461274187259178, 0.21226179692488983, 0.14194607536315287], [-0.29089551972856004 -0.7086270946133293 -0.6428276887173754; -0.9203610429640889 0.023709029028269546 0.390350397126212; 0.2613720474647311 -0.7051847436823973 0.6590896221923739])
Map
julia> map(_ -> svd(rand(3, 3)), 1:2)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.5807809149601634 0.5635242755434755 0.5874809951745127; -0.6884131975465821 0.0451903888051729 -0.7239095925620322; -0.43448912329507794 -0.8248625459025509 0.3616918330643316], [1.488618654040125, 0.4122166626927311, 0.004235624485479941], [-0.6721098925787947 -0.2684664121709399 -0.6900681689759235; -0.7384292974335966 0.31185073633575333 0.5978890289498324; -0.05468514413847799 -0.9114136842196914 0.4078414290231468])
SVD{Float64,Float64,Array{Float64,2}}([-0.3677873424759118 0.8090638526628051 -0.4584191892023337; -0.43071684640222546 -0.5851169278783189 -0.6871107472129654; -0.8241452960126802 -0.055261768200600137 0.5636760310989947], [1.6862363968739773, 0.5899255050748418, 0.24246688716190598], [-0.3751742784957875 -0.7172409091515735 -0.5872050229643736; 0.8600668700980193 -0.505618838823938 0.06807766730822862; -0.3457300098559026 -0.4794945964927631 0.8065703268899])
Broadcasting
julia> g = (rand(3, 3) for _ in 1:2)
Base.Generator{UnitRange{Int64},var"#17#18"}(var"#17#18"(), 1:2)
julia> svd.(g)
2-element Array{SVD{Float64,Float64,Array{Float64,2}},1}:
SVD{Float64,Float64,Array{Float64,2}}([-0.7988295268840152 0.5443221484534134 -0.256095266807727; -0.5436890668169485 -0.8354777569473182 -0.0798693700362902; -0.257436566171119 0.07543418554831638 0.963346302244777], [1.8188722412547844, 0.3934389096422389, 0.2020398396772306], [-0.7147404794808727 -0.37763644211761316 -0.5886737335538281; -0.6944558966482991 0.4830041206449164 0.5333273169925189; -0.08292800854873916 -0.7899985677359054 0.607474450798845])
SVD{Float64,Float64,Array{Float64,2}}([-0.5910620103531503 0.3599866268397522 0.7218416228050514; -0.7367495542691711 0.12340124384185132 -0.664809918173956; -0.3283988340440176 -0.9247603805931685 0.1922821996018057], [1.826019614357666, 0.5333148215847028, 0.11639139812894106], [-0.6415954756495915 -0.6888196183142843 -0.33746522643279503; -0.5845558664639438 0.7239484700883465 -0.3663236978948133; -0.4966383841474222 0.037764349353666515 0.8671356118331964])
Furthermore, mapping, broadcasting, and list comprehensions should be just as efficient as pre-allocating the vector. If you're doing a simple mapping, then it's usually easier and more readable to use mapping, broadcasting, or list comprehensions. Pre-allocating vectors is a tool I reserve for writing custom algorithms from scratch.
A final note. In most cases, type parameters are considered an implementation detail and are not a part of the public API for a type. As such, it's best to use generic programming approaches that do not rely on fixing the types for type parameters. Of course there are some exceptions to this rule of thumb, like Array{T,N} and Dict{K,V}.
There's a differnent way of preallocation -- you can reuse the input array by always overwriting it, with both the rand call and svd's internal needs:
function test!(F::Vector{SVD})
A = Matrix{Float64}(undef, 3, 3)
for i in 1:10
rand!(A)
F[i] = svd!(A)
end
end
Cameron's advice still holds. I'd probably use something like
function test()
A = Matrix{Float64}(undef, 3, 3)
return map(1:10) do i
svd!(rand!(A))
end
end
given that the number of loops seems not be the critical part.

Julia : construct Dictionary with tuple values

Is there a possibility to construct dictionary with tuple values in Julia?
I tried
dict = Dict{Int64, (Int64, Int64)}()
dict = Dict{Int64, Tuple(Int64, Int64)}()
I also tried inserting tuple values but I was able to change them after so they were not tuples.
Any idea?
Edit:
parallel_check = Dict{Any, (Any, Any)}()
for i in 1:10
dict[i] = (i+41, i+41)
end
dict[1][2] = 1 # not able to change this way, setindex error!
dict[1] = (3, 5) # this is acceptable. why?
The syntax for tuple types (i.e. the types of tuples) changed from (Int64,Int64) in version 0.3 and earlier to Tuple{Int64,Int64} in 0.4. Note the curly braces, not parens around Int64,Int64. You can also discover this at the REPL by applying the typeof function to an example tuple:
julia> typeof((1,2))
Tuple{Int64,Int64}
So you can construct the dictionary you want like this:
julia> dict = Dict{Int64,Tuple{Int64,Int64}}()
Dict{Int64,Tuple{Int64,Int64}} with 0 entries
julia> dict[1] = (2,3)
(2,3)
julia> dict[2.0] = (3.0,4)
(3.0,4)
julia> dict
Dict{Int64,Tuple{Int64,Int64}} with 2 entries:
2 => (3,4)
1 => (2,3)
The other part of your question is unrelated, but I'll answer it here anyway: tuples are immutable – you cannot change one of the elements in a tuple. Dictionaries, on the other hand are mutable, so you can assign an entirely new tuple value to a slot in a dictionary. In other words, when you write dict[1] = (3,5) you are assigning into dict, which is ok, but when you write dict[1][2] = 1 you are assigning into the tuple at position 1 in dict which is not ok.

Julia: Index an array with Int64?

The following code complains
ERROR: `setindex!` has no method matching setindex!(::Type{Array{Int32,32}}, ::Int32, ::Int64)
Should I be able to do this? The problem, I think, is that the loop variable has the wrong type to be used as an array index?
n = parseint(readline(STDIN))
A = Array{Int32, n}
for i in 1:n-1
ai = parseint(Int32, readuntil(STDIN, ' '))
A[i] = ai #The error happens here!
end
A[n] = parseint(Int32, readline(STDIN))
Your assignment of A is legal, but it doesn't do what you think it does.
A = Array{Int32,n}
julia> typeof(A)
DataType
This declares an A to be the type representing an array of n dimensions. What you want instead, probably is A to be a variable of type Array{Int32,1} that contains n elements. So instead try the following:
A = Array(Int32,n);
julia> typeof(A)
Array{Int32,1}

Resources