I'm wondering about immutable types and performances in Julia.
In which case does making a composite type immutable improve perfomances? The documentation says
They are more efficient in some cases. Types like the Complex example
above can be packed efficiently into arrays, and in some cases the
compiler is able to avoid allocating immutable objects entirely.
I don't really understand the second part.
Are there cases where making a composite type immutable reduce performance (beyond the case where a field needs to be changed by reference)? I thought one example could be when an object of an immutable type is used repeatedly as an argument, since
An object with an immutable type is passed around (both in assignment statements and in function calls) by copying, whereas a mutable type is passed around by reference.
However, I can't find any difference in a simple example:
abstract MyType
type MyType1 <: MyType
v::Vector{Int}
end
immutable MyType2 <: MyType
v::Vector{Int}
end
g(x::MyType) = sum(x.v)
function f(x::MyType)
a = zero(Int)
for i in 1:10_000
a += g(x)
end
return a
end
x = fill(one(Int), 10_000)
x1 = MyType1(x)
#time f(x1)
# elapsed time: 0.030698826 seconds (96 bytes allocated)
x2 = MyType2(x)
#time f(x2)
# elapsed time: 0.031835494 seconds (96 bytes allocated)
So why isn't f slower with an immutable type? Are there cases where using immutable types makes a code slower?
Immutable types are especially fast when they are small and consist entirely of immediate data, with no references (pointers) to heap-allocated objects. For example, an immutable type that consists of two Ints can potentially be stored in registers and never exist in memory at all.
Knowing that a value won't change also helps us optimize code. For example you access x.v inside a loop, and since x.v will always refer to the same vector we can hoist the load for it outside the loop instead of re-loading on every iteration. However whether you get any benefit from that depends on whether that load was taking a significant fraction of the time in the loop.
It is rare in practice for immutables to slow down code, but there are two cases where it might happen. First, if you have a large immutable type (say 100 Ints) and do something like sorting an array of them where you need to move them around a lot, the extra copying might be slower than pointing to objects with references. Second, immutable objects are usually not allocated on the heap initially. If you need to store a heap reference to one (e.g. in an Any array), we need to move the object to the heap. From there the compiler is often not smart enough to re-use the heap-allocated version of the object, and so might copy it repeatedly. In such a case it would have been faster to just heap-allocate a single mutable object up front.
This test includes a special cases, so is not extendable and could not reject better performance of immutable types.
check following test and look at different allocation times,when create a vector of immutables compare to a vector of mutables
abstract MyType
type MyType1 <: MyType
i::Int
b::Bool
f::Float64
end
immutable MyType2 <: MyType
i::Int
b::Bool
f::Float64
end
#time x=[MyType2(i,1,1) for i=1:100_000];
# => 0.001396 seconds (2 allocations: 1.526 MB)
#time x=[MyType1(i,1,1) for i=1:100_000];
# => 0.003683 seconds (100.00 k allocations: 3.433 MB)
Related
I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.
You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.
I have a simple question. I have defined a struct, and I need to inititate a lot (in the order of millions) of them and loop over them.
I am initiating one at a time and going through the loop as follows:
using Distributions
mutable struct help_me{Z<:Bool}
can_you_help_me::Z
millions_of_thanks::Z
end
for i in 1:max_iter
tmp_help = help_me(rand(Bernoulli(0.5),1)[1],rand(Bernoulli(0.99),1)[1])
# many follow-up processes
end
The memory allocation scales up in max_iter. For my purpose, I do not need to save each struct. Is there a way to "re-use" the memory allocation used by the struct?
Your main problem lies here:
rand(Bernoulli(0.5),1)[1], rand(Bernoulli(0.99),1)[1]
You are creating a length-1 array and then reading the first element from that array. This allocates unnecessary memory and takes time. Don't create an array here. Instead, write
rand(Bernoulli(0.5)), rand(Bernoulli(0.99))
This will just create random scalar numbers, no array.
Compare timings here:
julia> using BenchmarkTools
julia> #btime rand(Bernoulli(0.5),1)[1]
36.290 ns (1 allocation: 96 bytes)
false
julia> #btime rand(Bernoulli(0.5))
6.708 ns (0 allocations: 0 bytes)
false
6 times as fast, and no memory allocation.
This seems to be a general issue. Very often I see people writing rand(1)[1], when they should be using just rand().
Also, consider whether you actually need to make the struct mutable, as others have mentioned.
If the structure is not needed anymore (i.e. not referenced anywhere outside the current loop iteration), the Garbage Collector will free up its memory automatically if required.
Otherwise, I agree with the suggestions of Oscar Smith: memory allocation and garbage collection take time, avoid it for performance reasons if possible.
I have the following code:
julia> struct Point
x
y
end
julia> Point(1,2) == Point(1,2)
true
julia> mutable struct Points
x
y
end
julia> Points(1,2) == Points(1,2)
false
Why are the two objects equal when it is a normal struct but not equal when it is a mutable struct?
The reason is that by default == falls back to ===. Now the way === works is (citing the documentation):
First the types of x and y are compared. If those are identical,
mutable objects are compared by address in memory and immutable objects (such as numbers)
are compared by contents at the bit level.
So for Point, which is immutable, the comparison of contents is performed (and it is identical in your case). While Points is mutable, so memory addresses of passed objects are compared and they are different as you have created two distinct objects.
Bogumił Kamiński is correct, but you might as why that difference in the definition of === exists between mutable and immutable types. The reason is that your immutable structs Point are actually indistinguishable. Since they can't change, their values will always be the same, and so they might as well be two names for the same object. Therefore, In the language they are defined by only their value.
In contrast, for mutabke structs there are at least two ways you can distinguish them. First, since mutable structs can't usually be stack allocated, they have a memory location, and you can compare the memory location of the two mutable structs and see they are different. Second, you can simply mutate one of them, and see that only one object changes whereas the other doesn't.
So, the reason for the difference in the definition of === is that two identitcal mutable structs can be distinguished, but two immutable ones cannot.
In Julia, say I have an object_id for a variable but have forgotten its name, how can I retrieve the object using the id?
I.e. I want the inverse of some_id = object_id(some_object).
As #DanGetz says in the comments, object_id is a hash function and is designed not to be invertible. #phg is also correct that ObjectIdDict is intended precisely for this purpose (it is documented although not discussed much in the manual):
ObjectIdDict([itr])
ObjectIdDict() constructs a hash table where the keys are (always)
object identities. Unlike Dict it is not parameterized on its key and
value type and thus its eltype is always Pair{Any,Any}.
See Dict for further help.
In other words, it hashes objects by === using object_id as a hash function. If you have an ObjectIdDict and you use the objects you encounter as the keys into it, then you can keep them around and recover those objects later by taking them out of the ObjectIdDict.
However, it sounds like you want to do this without the explicit ObjectIdDict just by asking which object ever created has a given object_id. If so, consider this thought experiment: if every object were always recoverable from its object_id, then the system could never discard any object, since it would always be possible for a program to ask for that object by ID. So you would never be able to collect any garbage, and the memory usage of every program would rapidly expand to use all of your RAM and disk space. This is equivalent to having a single global ObjectIdDict which you put every object ever created into. So inverting the object_id function that way would require never deallocating any objects, which means you'd need unbounded memory.
Even if we had infinite memory, there are deeper problems. What does it mean for an object to exist? In the presence of an optimizing compiler, this question doesn't have a clear-cut answer. It is often the case that an object appears, from the programmer's perspective, to be created and operated on, but in reality – i.e. from the hardware's perspective – it is never created. Consider this function which constructs a complex number and then uses it for a simple computation:
julia> function f(y::Real)
z = Complex(0,y)
w = 2z*im
return real(w)
end
f (generic function with 1 method)
julia> foo(123)
-246
From the programmer's perspective, this constructs the complex number z and then constructs 2z, then 2z*im, and finally constructs real(2z*im) and returns that value. So all of those values should be inserted into the "Great ObjectIdDict in the Sky". But are they really constructed? Here's the LLVM code for this function applied to an Int:
julia> #code_llvm foo(123)
define i64 #julia_foo_60833(i64) #0 !dbg !5 {
top:
%1 = shl i64 %0, 1
%2 = sub i64 0, %1
ret i64 %2
}
No Complex values are constructed at all! Instead, all of the work is inlined and eliminated instead of actually being done. The whole computation boils down to just doubling the argument (by shifting it left one bit) and negating it (by subtracting it from zero). This optimization can be done first and foremost because the intermediate steps have no observable side effects. The compiler knows that there's no way to tell the difference between actually constructing complex values and operating on them and just doing a couple of integer ops – as long as the end result is always the same. Implicit in the idea of a "Great ObjectIdDict in the Sky" is the assumption that all objects that seem to be constructed actually are constructed and inserted into a large, permanent data structure – which is a massive side effect. So not only is recovering objects from their IDs incompatible with garbage collection, it's also incompatible with almost every conceivable program optimization.
The only other way one could conceive of inverting object_id would be to compute its inverse image on demand instead of saving objects as they are created. That would solve both the memory and optimization problems. Of course, it isn't possible since there are infinitely many possible objects but only a finite number of object IDs. You are vanishingly unlikely to actually encounter two objects with the same ID in a program, but the finiteness of the ID space means that inverting the hash function is impossible in principle since the preimage of each ID value contains an infinite number of potential objects.
I've probably refuted the possibility of an inverse object_id function far more thoroughly than necessary, but it led to some interesting thought experiments, and I hope it's been helpful – or at least thought provoking. The practical answer is that there is no way to get around explicitly stashing every object you might want to get back later in an ObjectIdDict.
This link: http://research.swtch.com/godata
It says (third paragraph of section Slices):
Because slices are multiword structures, not pointers, the slicing
operation does not need to allocate memory, not even for the slice
header, which can usually be kept on the stack. This representation
makes slices about as cheap to use as passing around explicit pointer
and length pairs in C. Go originally represented a slice as a pointer
to the structure shown above, but doing so meant that every slice
operation allocated a new memory object. Even with a fast allocator,
that creates a lot of unnecessary work for the garbage collector, and
we found that, as was the case with strings above, programs avoided
slicing operations in favor of passing explicit indices. Removing the
indirection and the allocation made slices cheap enough to avoid
passing explicit indices in most cases.
What...? Why does it not allocate any memory? If it is a multiword structure or a pointer? Does it not need to allocate memory? Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now? Very confused
To expand on Pravin Mishra's answer:
the slicing operation does not need to allocate memory.
"Slicing operation" refers to things like s1[x:y] and not slice initialization or make([]int, x). For example:
var s1 = []int{0, 1, 2, 3, 4, 5} // <<- allocates (or put on stack)
s2 := s1[1:3] // <<- does not (normally) allocate
That is, the second line is similar to:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
…
example := SliceHeader{&s1[1], 2, 5}
Usually local variables like example get put onto the stack. It's just like if this was done instead of using a struct:
var exampleData uintptr
var exampleLen, exampleCap int
Those example* variables go onto the stack.
Only if the code does return &example or otherFunc(&example) or otherwise allows a pointer to this to escape will the compiler be forced to allocate the struct (or slice header) on the heap.
Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now?
Imagine that instead of the above you did:
example2 := &SliceHeader{…same…}
// or
example3 := new(SliceHeader)
example3.Data = …
example3.Len = …
example3.Cap = …
i.e. the type is *SliceHeader rather than SliceHeader.
This is effectively what slices used to be (pre Go 1.0) according to what you mention.
It also used to be that both example2 and example3 would have to be allocated on the heap. That is the "memory for a new object" being refered to. I think that now escape analysis will try and put both of these onto the stack as long as the pointer(s) are kept local to the function so it's not as big of an issue anymore. Either way though, avoiding one level of indirection is good, it's almost always faster to copy three ints compared to copying a pointer and dereferencing it repeatedly.
Every data type allocates memory when it's initialized. In blog, he clearly mention
the slicing operation does not need to allocate memory.
And he is right. Now see, how slice works in golang.
Slices hold references to an underlying array, and if you assign one
slice to another, both refer to the same array. If a function takes a
slice argument, changes it makes to the elements of the slice will be
visible to the caller, analogous to passing a pointer to the
underlying array.