This is a constructor for arrays:
Array{T}(undef, dims)
I am new to Julia, and don't have a good background in programming. In this syntax, why is undef used for creating the array?
What is a constructor in Julia, in what situation do we use a constructor?
If we don't type constructor, Julia will automatically create a constructor. Then, Why we use constructor?
First, you want to understand what is a constructor:
For that, I suggest you the Julia doc: Constructors in Julia
Now that you have the theory, let's break apart this expression:
a = Array{Int}(undef, (2, 2))
What this expression is saying is "I want a to be an Array of dimension (2, 2)". So Julia will ask for some memory space. When I write it on the Julia REPL:
julia> a = Array{Int}(undef, (2, 2))
2×2 Array{Int64,2}:
0 0
0 0
Now Array{T}(undef, dims) is the generalization of that. "Construct an array of a specific type T with a specific number of dimensions dims"
So far, I didn't explain what is undef. undef is a shortcut for UndefInitializer(). In this example, we wanted an uninitialized array. What does it mean? For that, you have to understand that variables are not created ex nihilo on your terminal. They are occupying a specific place in the memory of your computer. And sometimes, the same memory space was occupied by another variable. So the space my new variable can take might not be empty:
julia> a = Array{Float64}(undef, (2, 2))
2×2 Array{Float64,2}:
6.94339e-310 6.94339e-310
6.94339e-310 0.0
Here, I never asked for these values to be there. I could erase it to work with a clean variable. But that would mean to erase the value for each cell, and it's much more expensive for the computer to replace each value rather than declaring "here is the new variable".
So basically, undef and uninitialized arrays are used for performance purposes. If you want an array well initialized, you can use fill.
Related
I have an array representing image metadata that I want to clear out. Currently, I use the empty!() function but it is throwing an error because I think it needs to be a dictionary and I have an array so I am looking for an alternate way to clear the Array.
For reference, here is the type I am working with:
Array{ImageMetadata.ImageMeta{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2,Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2},P} where P<:AbstractDict{Symbol,Any},1}
empty! shouldn't be throwing an error:
julia> x = rand(10);
julia> empty!(x)
0-element Array{Float64,1}
Would be useful to know what error you're actually seeing.
I am not sure what you mean by "clear", but in general you can use fill! to fill the array with a value of your choice.
However, if your question is to make an array filled with some values to be set to have #undef in its entries (if its eltype allows for it), then I do not know of a method to make this. You can use similar to create a new array that will have the same shape and be uninitialized.
I am looking for different ways of writing for loops in Julia! I know this is a basic question but I'm wondering what some of the different options are and if there are advantages/disadvantages with respect to performance.
For loop
Pro: fully flexible has break and continue
Con: no return, must specify iterator at start
While loop
Pro: fully flexible has break and continue
Con: no return, if iterator must be handled manually
Label+goto
Please don't use this for loops
Generator comprehension/Vector comprehension
Pro: Has return value, continue is expressed with filter clause, comes in lazy (generator) and eager forms (vector), can create multidimensional return vale
Con: really ugly for anything long, no break
Broadcast
Pro: express transform of multiple input susictly, has return value with output structure matching what it should be. Can be expressed with just a dot and supports loop fusion.
Con: no break no contine. Writing body means writing a function. Wrapping things you want to broadcast as scalar in Ref is a bit ugly
Map/pmap/asyncmap
Written in do-block form
Pro: can easily change to run distributed or asynchronously, had a return value
Con: no break, no continue
foreach function
It is a lot like map but no return value. So save on allocating that.
Other than that same pros and cons
This is straight from the Julia docs:
The for loop makes common repeated evaluation idioms easier to write. Since counting up and down like the above while loop does is so common, it can be expressed more concisely with a for loop:
julia> for i = 1:5
println(i)
end
1
2
3
4
5
Here the 1:5 is a range object, representing the sequence of numbers 1, 2, 3, 4, 5. The for loop iterates through these values, assigning each one in turn to the variable i. One rather important distinction between the previous while loop form and the for loop form is the scope during which the variable is visible. If the variable i has not been introduced in another scope, in the for loop form, it is visible only inside of the for loop, and not outside/afterward. You'll either need a new interactive session instance or a different variable name to test this:
julia> for j = 1:5
println(j)
end
1
2
3
4
5
julia> j
ERROR: UndefVarError: j not defined
See Scope of Variables for a detailed explanation of the variable scope and how it works in Julia.
In general, the for loop construct can iterate over any container. In these cases, the alternative (but fully equivalent) keyword in or ∈ is typically used instead of =, since it makes the code read more clearly:
julia> for i in [1,4,0]
println(i)
end
1
4
0
julia> for s ∈ ["foo","bar","baz"]
println(s)
end
foo
bar
baz
Various types of iterable containers will be introduced and discussed in later sections of the manual (see, e.g., Multi-dimensional Arrays).
Just playing around with Julia (1.0) and one thing that I need to use a lot in Python/numpy/matlab is the squeeze function to drop the singleton dimensions.
I found out that one way to do this in Julia is:
a = rand(3, 3, 1);
a = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
The second line seems a bit cumbersome and not easy to read and parse instantly (this could also be my bias that I bring from other languages). However, I wonder if this is the canonical way to do this in Julia?
The actual answer to this question surprised me. What you are asking could be rephrased as:
why doesn't dropdims(a) remove all singleton dimensions?
I'm going to quote Tim Holy from the relevant issue here:
it's not possible to have squeeze(A) return a type that the compiler
can infer---the sizes of the input matrix are a runtime variable, so
there's no way for the compiler to know how many dimensions the output
will have. So it can't possibly give you the type stability you seek.
Type stability aside, there are also some other surprising implications of what you have written. For example, note that:
julia> f(a) = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
f (generic function with 1 method)
julia> f(rand(1,1,1))
0-dimensional Array{Float64,0}:
0.9939103383167442
In summary, including such a method in Base Julia would encourage users to use it, resulting in potentially type-unstable code that, under some circumstances, will not be fast (something the core developers are strenuously trying to avoid). In languages like Python, rigorous type-stability is not enforced, and so you will find such functions.
Of course, nothing stops you from defining your own method as you have. And I don't think you'll find a significantly simpler way of writing it. For example, the proposition for Base that was not implemented was the method:
function squeeze(A::AbstractArray)
singleton_dims = tuple((d for d in 1:ndims(A) if size(A, d) == 1)...)
return squeeze(A, singleton_dims)
end
Just be aware of the potential implications of using it.
Let me simply add that "uncontrolled" dropdims (drop any singleton dimension) is a frequent source of bugs. For example, suppose you have some loop that asks for a data array A from some external source, and you run R = sum(A, dims=2) on it and then get rid of all singleton dimensions. But then suppose that one time out of 10000, your external source returns A for which size(A, 1) happens to be 1: boom, suddenly you're dropping more dimensions than you intended and perhaps at risk for grossly misinterpreting your data.
If you specify those dimensions manually instead (e.g., dropdims(R, dims=2)) then you are immune from bugs like these.
You can get rid of tuple in favor of a comma ,:
dropdims(a, dims = (findall(size(a) .== 1)...,))
I'm a bit surprised at Colin's revelation; surely something relying on 'reshape' is type stable? (plus, as a bonus, returns a view rather than a copy).
julia> function squeeze( A :: AbstractArray )
keepdims = Tuple(i for i in size(A) if i != 1);
return reshape( A, keepdims );
end;
julia> a = randn(2,1,3,1,4,1,5,1,6,1,7);
julia> size( squeeze(a) )
(2, 3, 4, 5, 6, 7)
No?
In Julia, say I have an object_id for a variable but have forgotten its name, how can I retrieve the object using the id?
I.e. I want the inverse of some_id = object_id(some_object).
As #DanGetz says in the comments, object_id is a hash function and is designed not to be invertible. #phg is also correct that ObjectIdDict is intended precisely for this purpose (it is documented although not discussed much in the manual):
ObjectIdDict([itr])
ObjectIdDict() constructs a hash table where the keys are (always)
object identities. Unlike Dict it is not parameterized on its key and
value type and thus its eltype is always Pair{Any,Any}.
See Dict for further help.
In other words, it hashes objects by === using object_id as a hash function. If you have an ObjectIdDict and you use the objects you encounter as the keys into it, then you can keep them around and recover those objects later by taking them out of the ObjectIdDict.
However, it sounds like you want to do this without the explicit ObjectIdDict just by asking which object ever created has a given object_id. If so, consider this thought experiment: if every object were always recoverable from its object_id, then the system could never discard any object, since it would always be possible for a program to ask for that object by ID. So you would never be able to collect any garbage, and the memory usage of every program would rapidly expand to use all of your RAM and disk space. This is equivalent to having a single global ObjectIdDict which you put every object ever created into. So inverting the object_id function that way would require never deallocating any objects, which means you'd need unbounded memory.
Even if we had infinite memory, there are deeper problems. What does it mean for an object to exist? In the presence of an optimizing compiler, this question doesn't have a clear-cut answer. It is often the case that an object appears, from the programmer's perspective, to be created and operated on, but in reality – i.e. from the hardware's perspective – it is never created. Consider this function which constructs a complex number and then uses it for a simple computation:
julia> function f(y::Real)
z = Complex(0,y)
w = 2z*im
return real(w)
end
f (generic function with 1 method)
julia> foo(123)
-246
From the programmer's perspective, this constructs the complex number z and then constructs 2z, then 2z*im, and finally constructs real(2z*im) and returns that value. So all of those values should be inserted into the "Great ObjectIdDict in the Sky". But are they really constructed? Here's the LLVM code for this function applied to an Int:
julia> #code_llvm foo(123)
define i64 #julia_foo_60833(i64) #0 !dbg !5 {
top:
%1 = shl i64 %0, 1
%2 = sub i64 0, %1
ret i64 %2
}
No Complex values are constructed at all! Instead, all of the work is inlined and eliminated instead of actually being done. The whole computation boils down to just doubling the argument (by shifting it left one bit) and negating it (by subtracting it from zero). This optimization can be done first and foremost because the intermediate steps have no observable side effects. The compiler knows that there's no way to tell the difference between actually constructing complex values and operating on them and just doing a couple of integer ops – as long as the end result is always the same. Implicit in the idea of a "Great ObjectIdDict in the Sky" is the assumption that all objects that seem to be constructed actually are constructed and inserted into a large, permanent data structure – which is a massive side effect. So not only is recovering objects from their IDs incompatible with garbage collection, it's also incompatible with almost every conceivable program optimization.
The only other way one could conceive of inverting object_id would be to compute its inverse image on demand instead of saving objects as they are created. That would solve both the memory and optimization problems. Of course, it isn't possible since there are infinitely many possible objects but only a finite number of object IDs. You are vanishingly unlikely to actually encounter two objects with the same ID in a program, but the finiteness of the ID space means that inverting the hash function is impossible in principle since the preimage of each ID value contains an infinite number of potential objects.
I've probably refuted the possibility of an inverse object_id function far more thoroughly than necessary, but it led to some interesting thought experiments, and I hope it's been helpful – or at least thought provoking. The practical answer is that there is no way to get around explicitly stashing every object you might want to get back later in an ObjectIdDict.
I am using this piece of code and a stackoverflow will be triggered, if I use Extlib's Hashtbl the error does not occur. Any hints to use specialized Hashtbl without stackoverflow?
module ColorIdxHash = Hashtbl.Make(
struct
type t = Img_types.rgb_t
let equal = (==)
let hash = Hashtbl.hash
end
)
(* .. *)
let (ctable: int ColorIdxHash.t) = ColorIdxHash.create 256 in
for x = 0 to width -1 do
for y = 0 to height -1 do
let c = Img.get img x y in
let rgb = Color.rgb_of_color c in
if not (ColorIdxHash.mem ctable rgb) then ColorIdxHash.add ctable rgb (ColorIdxHash.length ctable)
done
done;
(* .. *)
The backtrace points to hashtbl.ml:
Fatal error: exception Stack_overflow Raised at file "hashtbl.ml",
line 54, characters 16-40 Called from file "img/write_bmp.ml", line
150, characters 52-108 ...
Any hints?
Well, you're using physical equality (==) to compare the colors in your hash table. If the colors are structured values (I can't tell from this code), none of them will be physically equal to each other. If all the colors are distinct objects, they will all go into the table, which could really be quite a large number of objects. On the other hand, the hash function is going to be based on the actual color R,G,B values, so there may well be a large number of duplicates. This will mean that your hash buckets will have very long chains. Perhaps some internal function isn't tail recursive, and so is overflowing the stack.
Normally the length of the longest chain will be 2 or 3, so it wouldn't be surprising that this error doesn't come up often.
Looking at my copy of hashtbl.ml (OCaml 3.12.1), I don't see anything non-tail-recursive on line 54. So my guess might be wrong. On line 54 a new internal array is allocated for the hash table. So another idea is just that your hashtable is just getting too big (perhaps due to the unwanted duplicates).
One thing to try is to use structural equality (=) and see if the problem goes away.
One reason you may have non-termination or stack overflows is if your type contains cyclic values. (==) will terminates on cyclic values (while (=) may not), but Hash.hash is probably not cycle-safe. So if you manipulate cyclic values of type Img_types.rgb_t, you have to devise your one cycle-safe hash function -- typically, calling Hash.hash on only one of the non-cyclic subfields/subcomponents of your values.
I've already been bitten by precisely this issue in the past. Not a fun bug to track down.