Slices in golang do not allocate any memory? - pointers

This link: http://research.swtch.com/godata
It says (third paragraph of section Slices):
Because slices are multiword structures, not pointers, the slicing
operation does not need to allocate memory, not even for the slice
header, which can usually be kept on the stack. This representation
makes slices about as cheap to use as passing around explicit pointer
and length pairs in C. Go originally represented a slice as a pointer
to the structure shown above, but doing so meant that every slice
operation allocated a new memory object. Even with a fast allocator,
that creates a lot of unnecessary work for the garbage collector, and
we found that, as was the case with strings above, programs avoided
slicing operations in favor of passing explicit indices. Removing the
indirection and the allocation made slices cheap enough to avoid
passing explicit indices in most cases.
What...? Why does it not allocate any memory? If it is a multiword structure or a pointer? Does it not need to allocate memory? Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now? Very confused

To expand on Pravin Mishra's answer:
the slicing operation does not need to allocate memory.
"Slicing operation" refers to things like s1[x:y] and not slice initialization or make([]int, x). For example:
var s1 = []int{0, 1, 2, 3, 4, 5} // <<- allocates (or put on stack)
s2 := s1[1:3] // <<- does not (normally) allocate
That is, the second line is similar to:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
…
example := SliceHeader{&s1[1], 2, 5}
Usually local variables like example get put onto the stack. It's just like if this was done instead of using a struct:
var exampleData uintptr
var exampleLen, exampleCap int
Those example* variables go onto the stack.
Only if the code does return &example or otherFunc(&example) or otherwise allows a pointer to this to escape will the compiler be forced to allocate the struct (or slice header) on the heap.
Then it mentions that it was originally a pointer to that slice structure, and it needed to allocate memory for a new object. Why does it not need to do that now?
Imagine that instead of the above you did:
example2 := &SliceHeader{…same…}
// or
example3 := new(SliceHeader)
example3.Data = …
example3.Len = …
example3.Cap = …
i.e. the type is *SliceHeader rather than SliceHeader.
This is effectively what slices used to be (pre Go 1.0) according to what you mention.
It also used to be that both example2 and example3 would have to be allocated on the heap. That is the "memory for a new object" being refered to. I think that now escape analysis will try and put both of these onto the stack as long as the pointer(s) are kept local to the function so it's not as big of an issue anymore. Either way though, avoiding one level of indirection is good, it's almost always faster to copy three ints compared to copying a pointer and dereferencing it repeatedly.

Every data type allocates memory when it's initialized. In blog, he clearly mention
the slicing operation does not need to allocate memory.
And he is right. Now see, how slice works in golang.
Slices hold references to an underlying array, and if you assign one
slice to another, both refer to the same array. If a function takes a
slice argument, changes it makes to the elements of the slice will be
visible to the caller, analogous to passing a pointer to the
underlying array.

Related

How can I load all entries of a Vec<T> of arbitrary length onto the stack?

I am currently working with vectors and trying to ensure I have what is essentially an array of my vector on the stack. I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec. Is this at all possible?
Having read the Rustonomicon on how to implement Vec, it seems to stride over pointers on the heap, dereferencing at each entry. I want to chunk in Vec entries from the heap into the stack for fast access.
You can use the unsized_locals feature in nightly Rust:
#![feature(unsized_locals)]
fn example<T>(v: Vec<T>) {
let s: [T] = *v.into_boxed_slice();
dbg!(std::mem::size_of_val(&s));
}
fn main() {
let x = vec![42; 100];
example(x); // Prints 400
}
See also:
Is there a good way to convert a Vec<T> to an array?
How to get a slice as an array in Rust?
I cannot call Vec::into_boxed_slice since I am dynamically allocating space in my Vec
Sure you can.
Vec [...] seems to stride over pointers on the heap, dereferencing at each entry
Accessing each member in a Vec requires a memory dereference. Accessing each member in an array requires a memory dereference. There's no material difference in speed here.
for fast access
I doubt this will be any faster than directly accessing the data in the Vec. In fact, I wouldn't be surprised if it were slower, since you are copying it.

How to create a RAWSXP vector from C char* ptr without reallocation

Is there a way of creating a RAWSXP vector that is backed by an existing C char* ptr.
Below I show my current working version which needs to reallocate and copy the bytes,
and a second imagined version that doesn't exist.
// My current slow solution that uses lots of memory
SEXP getData() {
// has size, and data
Response resp = expensive_call();
//COPY OVER BYTE BY BYTE
SEXP respVec = Rf_allocVector(RAWSXP, resp.size);
Rbyte* ptr = RAW(respVec);
memcpy(ptr, resp.msg, resp.size);
// free the memory
free(resp.data);
return respVec;
}
// My imagined solution
SEXP getDataFast() {
// has size, and data
Response resp = expensive_call();
// reuse the ptr
SEXP respVec = Rf_allocVectorViaPtr(RAWSXP, resp.data, resp.size);
return respVec;
}
I also noticed Rf_allocVector3 which seems to give control over memory allocations of the vector, but I couldn't get this to work. This is my first time writing an R extension, so I imagine I must be doing something stupid. I'm trying to avoid the copy as the data will be around a GB (very large, sparse though, matrices).
Copying over 1 GB is < 1 second. If your call is expensive, it might be a marginal cost that you should profile to see if it's really a bottleneck.
The way you are trying to do things is probably not possible, because how would R know how to garbage collect the data?
But assuming you are using STL containers, one neat trick I've recently seen is to use the second template argument of STL containers -- the allocator.
template<
class T,
class Allocator = std::allocator<T>
> class vector;
The general outline of the strategy is like this:
Create a custom allocator using R-memory that meets all the requirements (essentially you just need allocate and deallocate)
Every time you need to a return data to R from an STL container, make sure you initialize it with your custom allocator
On returning the data, pull out the underlying R data created by your R-memory allocator -- no copy
This approach gives you all the flexibility of STL containers while using only memory R is aware of.

What is the core difference between t=&T{} and t=new(T)

It seems that both ways to create a new object pointer with all "0" member values, both returns a pointer:
type T struct{}
...
t1:=&T{}
t2:=new(T)
So what is the core difference between t1 and t2, or is there anything that "new" can do while &T{} cannot, or vice versa?
[…] is there anything that "new" can do while &T{} cannot, or vice versa?
I can think of three differences:
The "composite literal" syntax (the T{} part of &T{}) only works for "structs, arrays, slices, and maps" [link], whereas the new function works for any type [link].
For a struct or array type, the new function always generates zero values for its elements, whereas the composite literal syntax lets you initialize some of the elements to non-zero values if you like.
For a slice or map type, the new function always returns a pointer to nil, whereas the composite literal syntax always returns an initialized slice or map. (For maps this is very significant, because you can't add elements to nil.) Furthermore, the composite literal syntax can even create a non-empty slice or map.
(The second and third bullet-points are actually two aspects of the same thing — that the new function always creates zero values — but I list them separately because the implications are a bit different for the different types.)
For structs and other composites, both are same.
t1:=&T{}
t2:=new(T)
//Both are same
You cannot return the address of un-named variable initialised to zero value of other basic types like int without using new. You would need to create a named variable and then take its address.
func newInt() *int {
return new(int)
}
func newInt() *int {
// return &int{} --> invalid
var dummy int
return &dummy
}
See ruakh's answer. I want to point out some of the internal implementation details, though. You should not make use of them in production code, but they help illuminate what really happens behind the scenes, in the Go runtime.
Essentially, a slice is represented by three values. The reflect package exports a type, SliceHeader:
SliceHeader is the runtime representation of a slice. It cannot be used safely or portably and its representation may change in a later release. Moreover, the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
If we use this to inspect a variable of type []T (for any type T), we can see the three parts: the pointer to the underlying array, the length, and the capacity. Internally, a slice value v always has all three of these parts. There's a general condition that I think should hold, and if you don't use unsafe to break it, it seems by inspection that it will hold (based on limited testing anyway):
either the Data field is not zero (in which case Len and Cap can but need not be nonzero), or
the Data field is zero (in which case the Len and Cap should both be zero).
That slice value v is nil if the Data field is zero.
By using the unsafe package, we can break it deliberately (and then put it all back—and hopefully nothing goes wrong while we have it broken) and thus inspect the pieces. When this code on the Go Playground is run (there's a copy below as well), it prints:
via &literal: base of array is 0x1e52bc; len is 0; cap is 0.
Go calls this non-nil.
via new: base of array is 0x0; len is 0; cap is 0.
Go calls this nil even though we clobbered len() and cap()
Making it non-nil by unsafe hackery, we get [42] (with cap=1).
after setting *p1=nil: base of array is 0x0; len is 0; cap is 0.
Go calls this nil even though we clobbered len() and cap()
Making it non-nil by unsafe hackery, we get [42] (with cap=1).
The code itself is a bit long so I have left it to the end (or use the above link to the Playground). But it shows that the actual p == nil test in the source compiles to just an inspection of the Data field.
When you do:
p2 := new([]int)
the new function actually allocates only the slice header. It sets all three parts to zero and returns the pointer to the resulting header. So *p2 has three zero fields in it, which makes it a correct nil value.
On the other hand, when you do:
p1 := &[]int{}
the Go compiler builds an empty array (of size zero, holding zero ints) and then builds a slice header: the pointer part points to the empty array, and the length and capacity are set to zero. Then p1 points to this header, with the non-nil Data field. A later assignment, *p1 = nil, writes zeros into all three fields.
Let me repeat this with boldface: these are not promised by the language specification, they're just the actual implementation in action.
Maps work very similarly. A map variable is actually a pointer to a map header. The details of map headers are even less accessible than those of slice headers: there is no reflect type for them. The actual implementation is viewable here under type hmap (note that it is not exported).
What this means is that m2 := new(map[T1]T2) really only allocates one pointer, and set that pointer itself to nil. There is no actual map! The new function returns the nil pointer, and m2 is then nil. Likewise var m1 map[T1]T2 just sets a simple pointer value in m1 to nil. But var m3 map[T1]T2{} allocates an actual hmap structure, fills it in, and makes m3 point to it. We can once again peek behind the curtain on the Go Playground, with code that is not guaranteed to work tomorrow, to see this in effect.
As someone writing Go programs, you don't need to know any of this. But if you have worked with lower-level languages (assembly and C for instance), these explain a lot. In particular, these explain why you cannot insert into a nil map: the map variable itself holds a pointer value, and until the map variable itself has a non-nil pointer to a (possibly empty) map-header, there is no way to do the insertion. An insertion could allocate a new map and insert the data, but the map variable wouldn't point to the correct hmap header object.
(The language authors could have made this work by using a second level of indirection: a map variable could be a pointer pointing to the variable that points to the map header. Or they could have made map variables always point to a header, and made new actually allocate a header, the way make does; then there would never be a nil map. But they didn't do either of these, and we get what we get, which is fine: you just need to know to initialize the map.)
Here's the slice inspector. (Use the playground link to view the map inspector: given that I had to copy hmap's definition out of the runtime, I expect it to be particularly fragile and not worth showing. The slice header's structure seems far less likely to change over time.)
package main
import (
"fmt"
"reflect"
"unsafe"
)
func main() {
p1 := &[]int{}
p2 := new([]int)
show("via &literal", *p1)
show("\nvia new", *p2)
*p1 = nil
show("\nafter setting *p1=nil", *p1)
}
// This demonstrates that given a slice (p), the test
// if p == nil
// is really a test on p.Data. If it's zero (nil),
// the slice as a whole is nil. If it's nonzero, the
// slice as a whole is non-nil.
func show(what string, p []int) {
pp := unsafe.Pointer(&p)
sh := (*reflect.SliceHeader)(pp)
fmt.Printf("%s: base of array is %#x; len is %d; cap is %d.\n",
what, sh.Data, sh.Len, sh.Cap)
olen, ocap := len(p), cap(p)
sh.Len, sh.Cap = 1, 1 // evil
if p == nil {
fmt.Println(" Go calls this nil even though we clobbered len() and cap()")
answer := 42
sh.Data = uintptr(unsafe.Pointer(&answer))
fmt.Printf(" Making it non-nil by unsafe hackery, we get %v (with cap=%d).\n",
p, cap(p))
sh.Data = 0 // restore nil-ness
} else {
fmt.Println("Go calls this non-nil.")
}
sh.Len, sh.Cap = olen, ocap // undo evil
}

How can I retrieve an object by id in Julia

In Julia, say I have an object_id for a variable but have forgotten its name, how can I retrieve the object using the id?
I.e. I want the inverse of some_id = object_id(some_object).
As #DanGetz says in the comments, object_id is a hash function and is designed not to be invertible. #phg is also correct that ObjectIdDict is intended precisely for this purpose (it is documented although not discussed much in the manual):
ObjectIdDict([itr])
ObjectIdDict() constructs a hash table where the keys are (always)
object identities. Unlike Dict it is not parameterized on its key and
value type and thus its eltype is always Pair{Any,Any}.
See Dict for further help.
In other words, it hashes objects by === using object_id as a hash function. If you have an ObjectIdDict and you use the objects you encounter as the keys into it, then you can keep them around and recover those objects later by taking them out of the ObjectIdDict.
However, it sounds like you want to do this without the explicit ObjectIdDict just by asking which object ever created has a given object_id. If so, consider this thought experiment: if every object were always recoverable from its object_id, then the system could never discard any object, since it would always be possible for a program to ask for that object by ID. So you would never be able to collect any garbage, and the memory usage of every program would rapidly expand to use all of your RAM and disk space. This is equivalent to having a single global ObjectIdDict which you put every object ever created into. So inverting the object_id function that way would require never deallocating any objects, which means you'd need unbounded memory.
Even if we had infinite memory, there are deeper problems. What does it mean for an object to exist? In the presence of an optimizing compiler, this question doesn't have a clear-cut answer. It is often the case that an object appears, from the programmer's perspective, to be created and operated on, but in reality – i.e. from the hardware's perspective – it is never created. Consider this function which constructs a complex number and then uses it for a simple computation:
julia> function f(y::Real)
z = Complex(0,y)
w = 2z*im
return real(w)
end
f (generic function with 1 method)
julia> foo(123)
-246
From the programmer's perspective, this constructs the complex number z and then constructs 2z, then 2z*im, and finally constructs real(2z*im) and returns that value. So all of those values should be inserted into the "Great ObjectIdDict in the Sky". But are they really constructed? Here's the LLVM code for this function applied to an Int:
julia> #code_llvm foo(123)
define i64 #julia_foo_60833(i64) #0 !dbg !5 {
top:
%1 = shl i64 %0, 1
%2 = sub i64 0, %1
ret i64 %2
}
No Complex values are constructed at all! Instead, all of the work is inlined and eliminated instead of actually being done. The whole computation boils down to just doubling the argument (by shifting it left one bit) and negating it (by subtracting it from zero). This optimization can be done first and foremost because the intermediate steps have no observable side effects. The compiler knows that there's no way to tell the difference between actually constructing complex values and operating on them and just doing a couple of integer ops – as long as the end result is always the same. Implicit in the idea of a "Great ObjectIdDict in the Sky" is the assumption that all objects that seem to be constructed actually are constructed and inserted into a large, permanent data structure – which is a massive side effect. So not only is recovering objects from their IDs incompatible with garbage collection, it's also incompatible with almost every conceivable program optimization.
The only other way one could conceive of inverting object_id would be to compute its inverse image on demand instead of saving objects as they are created. That would solve both the memory and optimization problems. Of course, it isn't possible since there are infinitely many possible objects but only a finite number of object IDs. You are vanishingly unlikely to actually encounter two objects with the same ID in a program, but the finiteness of the ID space means that inverting the hash function is impossible in principle since the preimage of each ID value contains an infinite number of potential objects.
I've probably refuted the possibility of an inverse object_id function far more thoroughly than necessary, but it led to some interesting thought experiments, and I hope it's been helpful – or at least thought provoking. The practical answer is that there is no way to get around explicitly stashing every object you might want to get back later in an ObjectIdDict.

Is it ok to create big array of AVX/SSE values

I am parallelizing a certain dynamic programming problem using AVX2/SSE instructions.
In the main iteration of my calculation, I calculate column in matrix where each cell is a structure of AVX2 registers (_m256i). I use values from the previous matrix column as input values for calculating the current column. Columns can be big, so what I do is I have an array of structures (on stack), where each structure has two _m256i elements.
Structure:
struct Cell {
_m256i first;
_m256i second;
};
An then I have array like this: Cell prevColumn [N]. N will tipically be few hundreds.
I know that _m256i basically represents an avx2 register, so I am wondering how should I think about this array, how does it behave, since N is much larger than 16 (which is number of avx registers)? Is it a good practice to create such an array, or is there some better approach that i should use when storing a lot of _m256i values that are going to be reused real soon?
Also, is there any aligning I should be doing with this structures? I read a lot about aligning, but I am still not sure how and when to do it exactly.
It's better to structure your code to do everything it can with a value before moving on. Small buffers that fit in L1 cache aren't going to be too bad for performance, but don't do that unless you need to.
I think it's more typical to write your code with buffers of int [] type, rather than __m256i type, but I'm not sure. Either way works, and should get the compile to generate efficient code. But the int [] way means less code has to be different for the SSE, AVX2, and AVX512 version. And it might make it easier to examine things with a debugger, to have your data in an array with a type that will get the data formatted nicely.
As I understand it, the load/store intrinsics are partly there as a cast between _m256i and int [], since AVX doesn't fault on unaligned, just slows down on cacheline boundaries. Assigning to / from an array of _m256i should work fine, and generate load/store instructions where needed, otherwise generate vector instructions with memory source operands. (for more compact code and fewer fused-domain uops.)

Resources