What is the motivation behind this "pattern"? - pointers

I'm a bit confused when I see code such as:
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4 // correct...
bigBox.SmallBox.AnyMagicItem = true // also correct
Why, or when, would I want to do bigBox := &BigBox{} instead of bigBox := BigBox{} ? Is it more efficient in some way?
Code sample was taken from here.
Sample no.2:
package main
import "fmt"
type Ints struct {
x int
y int
}
func build_struct() Ints {
return Ints{0,0}
}
func build_pstruct() *Ints {
return &Ints{0,0}
}
func main() {
fmt.Println(build_struct())
fmt.Println(build_pstruct())
}
Sample no. 3: ( why would I go with &BigBox in this example, and not with BigBox as a struct directly ? )
func main() {
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4
fmt.Println(bigBox.BubbleGumsCount)
}
Is there ever a reason to call build_pstruct instead of the the build_struct variant? Isn't that why we have the GC?

I figured out one motivation for this kind of code: avoidance of "struct copying by accident".
If you use a struct variable to hold the newly created struct:
bigBox := BigBox{}
you may copy the struct by accident like this
myBox := bigBox // Where you just want a refence of bigBox.
myBox.BubbleGumsCount = 4
or like this
changeBoxColorToRed(bigBox)
where changeBoxColorToRed is
// It makes a copy of entire struct as parameter.
func changeBoxColorToRed(box bigBox){
// !!!! This function is buggy. It won't work as expected !!!
// Please see the fix at the end.
box.Color=red
}
But if you use a struct pointer:
bigBox := &BigBox{}
there will be no copying in
myBox := bigBox
and
changeBoxColorToRed(bigBox)
will fail to compile, giving you a chance to rethink the design of changeBoxColorToRed. The fix is obvious:
func changeBoxColorToRed(box *bigBox){
box.Color=red
}
The new version of changeBoxColorToRed does not copy the entire struct and works correctly.

bb := &BigBox{} creates a struct, but sets the variable to be a pointer to it. It's the same as bb := new(BigBox). On the other hand, bb := BigBox{} makes bb a variable of type BigBox directly. If you want a pointer (because perhaps because you're going to use the data via a pointer), then it's better to make bb a pointer, otherwise you're going to be writing &bb a lot. If you're going to use the data as a struct directly, then you want bb to be a struct, otherwise you're going to be dereferencing with *bb.
It's off the point of the question, but it's usually better to create data in one go, rather than incrementally by creating the object and subsequently updating it.
bb := &BigBox{
BubbleGumsCount: 4,
SmallBox: {
AnyMagicItem: true,
},
}

The & takes an address of something. So it means "I want a pointer to" rather than "I want an instance of". The size of a variable containing a value depends on the size of the value, which could be large or small. The size of a variable containing a pointer is 8 bytes.
Here are examples and their meanings:
bigBox0 := &BigBox{} // bigBox0 is a pointer to an instance of BigBox{}
bigBox1 := BigBox{} // bigBox1 contains an instance of BigBox{}
bigBox2 := bigBox // bigBox2 is a copy of bigBox
bigBox3 := &bigBox // bigBox3 is a pointer to bigBox
bigBox4 := *bigBox3 // bigBox4 is a copy of bigBox, dereferenced from bigBox3 (a pointer)
Why would you want a pointer?
To prevent copying a large object when passing it as an argument to a function.
You want to modify the value by passing it as an argument.
To keep a slice, backed by an array, small. [10]BigBox would take up "the size of BigBox" * 10 bytes. [10]*BigBox would take up 8 bytes * 10. A slice when resized has to create a larger array when it reaches its capacity. This means the memory of the old array has to be copied to the new array.
Why do you not what to use a pointer?
If an object is small, it's better just to make a copy. Especially if it's <= 8 bytes.
Using pointers can create garbage. This garbage has to be collected by the garbage collector. The garbage collector is a mark-and-sweep stop-the-world implementation. This means that it has to freeze your application to collect the garbage. The more garbage it has to collect, the longer that pause is. This individual, for example. experienced a pause up to 10 seconds.
Copying an object uses the stack rather than the heap. The stack is usually always faster than the heap. You really don't have to think about stack vs heap in Go as it decides what should go where, but you shouldn't ignore it either. It really depends on the compiler implementation, but pointers can result in memory going on the heap, resulting in the need for garbage collection.
Direct memory access is faster. If you have a slice []BigBox and it doesn't change size it can be faster to access. []BigBox is faster to read, whereas []*BigBox is faster to resize.
My general advice is use pointers sparingly. Unless you're dealing with a very large object that needs to be passed around, it's often better to pass around a copy on the stack. Reducing garbage is a big deal. The garbage collector will get better, but you're better off by keeping it as low as possible.
As always test your application and profile it.

The difference is between creating a reference object (with the ampersand) vs. a value object (without the ampersand).
There's a nice explanation of the general concept of value vs. reference type passing here... What's the difference between passing by reference vs. passing by value?
There is some discussion of these concepts with regards to Go here... http://www.goinggo.net/2013/07/understanding-pointers-and-memory.html

In general there is no difference between a &BigBox{} and BigBox{}. The Go compiler is free to do whatever it likes as long as the semantics are correct.
func StructToStruct() {
s := Foo{}
StructFunction(&s)
}
func PointerToStruct() {
p := &Foo{}
StructFunction(p)
}
func StructToPointer() {
s := Foo{}
PointerFunction(&s)
}
func PointerToPointer() {
p := &Foo{}
PointerFunction(p)
}
//passed as a pointer, but used as struct
func StructFunction(f *Foo) {
fmt.Println(*f)
}
func PointerFunction(f *Foo) {
fmt.Println(f)
}
Summary of the assembly:
StructToStruct: 13 lines, no allocation
PointerToStruct: 16 lines, no allocation
StructToPointer: 20 lines, heap allocated
PointerToPointer: 12 lines, heap allocated
With a perfect compiler the *ToStruct functions would be the identical as would the *ToPointer functions. Go's escape analysis is good enough to tell if a pointer escapes even across module boundries. Which ever way is most efficient is the way the compiler will do it.
If you're really into micro-optimization note that Go is most efficient when the syntax lines up with the semantics (struct used as a struct, pointer used as a pointer). Or you can just forget about it and declare the variable the way it will be used and you will be right most of the time.
Note: if Foo is really big PointerToStruct will heap allocate it. The spec threatens to that even StructToStruct is allowed to do this but I couldn't make it happen. The lesson here is that the compiler will do whatever it wants. Just as the details of the registers is shielded from the code, so is the state of the heap/stack. Don't change your code because you think you know how the compiler is going to use the heap.

Related

Rust way to transfer ownership while guaranteeing no underlying data copy

I am a bit confused about how to transfer ownership without the overhead of actual data copy.
I have the following code. I am referring to underlying data copy by OS as memcopy.
fn main() {
let v1 = Vec::from([1; 1024]);
take_ownership_but_memcopies(v1);
let v2 = Vec::from([2; 1024]);
dont_memecopy_but_dont_take_ownership(&v2);
let v3 = Vec::from([3; 1024]);
take_ownership_dont_memcopy(???);
}
// Moves but memcopies all elements
fn take_ownership_but_memcopies(my_vec1: Vec<i32>) {
println!("{:?}", my_vec1);
}
// Doesn't memcopy but doesn't take ownership
fn dont_memecopy_but_dont_take_ownership(my_vec2: &Vec<i32>) {
println!("{:?}", my_vec2);
}
// Take ownership without the overhead of memcopy
fn take_ownership_dont_memcopy(myvec3: ???) {
println!("{:?}", my_vec3);
}
As i understand, if i use reference like v2, i don't get the ownership. If i use it like v1, there could be a memcopy.
How should i need to transfer v3 to guarantee that there is no underlying memcopy by OS?
Your understanding of what happens when you move a Vec is incorrect - it does not copy every element within the Vec!
To understand why, we need to take a step back and look at how a Vec is represented internally:
// This is slightly simplified, look at the source for more details!
struct Vec<T> {
pointer: *mut T, // pointer to the data (on the heap)
capacity: usize, // the current capacity of the Vec
len: usize, // the current number of elements in the Vec
}
While the Vec conceptually 'owns' the elements, they are not stored within the Vec struct - it only holds a pointer to that data. So when you move a Vec, it is only the pointer (plus the capacity and length) that gets copied.
If you are attempting to avoid copying altogether, as opposed to avoiding copying the contents of the Vec, that isn't really possible - in the semantics of the compiler, a move is a copy (just one that prevents you from using the old data afterwards). However, the compiler can and will optimize trivial copies into something more efficient.
How should i need to transfer v3 to guarantee that there is no underlying memcopy by OS?
You can't. Because that's Rust's semantics.
However a Vec is just 3 words on the stack, that's all which gets "memcopy"d, which is intrinsic, it's not like you're going to get a memcpy function call in there or duplicate the entire vector. And that's assuming the function call does not get inlined, and the compiler does not decide to pass in object as a reference anyway. It could also pass all 3 words through registers, at which point there's nothing to memcpy.
Though it's not entirely clear why you care either way, if you only want to read from the collection your function should be
// Take ownership without the overhead of memcopy
fn take_ownership_dont_memcopy(myvec3: &[i32]) {
println!("{:?}", my_vec3);
}
that is the most efficient and flexible signature: it's just two words, there's a single pointer (unlike &Vec), and it allows for non-Vec sources.

Why is a global vector of pointers (without objects) in C++ leaking in Valgrind when it won't let me use delete after new in two separated files?

So I have a struct:
typedef struct {
int x = 0;
} Command;
and global vectors:
vector<Command> cmdList = {}; vector<Event*> eventList = {};
I push_back, erase and clear the vector in another .cpp file. This gets pushed back into:
vector<Command> cmdsToExec = {}; inside per Event struct created. I use this to push_back:
eventList.push_back( new Event() ); eventList[int( eventList.size() ) - 1]->cmdsToExec = cmdList;
My problem A) these Event*s can't be erased with delete and B) is that Valgrind gives this error while trying to determine the size of the cmdsToExec:
==25096== Invalid read of size 8
==25096== at 0x113372: std::vector<Command, std::allocator<Command> >::size() const (stl_vector.h:919)
==25096== by 0x11C1C7: eventHandler::processEvent() (eventHandler.cpp:131)
==25096== by 0x124590: main (main.cpp:88)
==25096== Address 0x630a9e0 is 32 bytes inside a block of size 56 free'd
==25096== at 0x484BB6F: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25096== by 0x11C116: eventHandler::processEvent() (eventHandler.cpp:222)
==25096== by 0x124590: main (main.cpp:88)
==25096== Block was alloc'd at
==25096== at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==25096== by 0x11B4A5: eventHandler::createEvent() (eventHandler.cpp:58)
==25096== by 0x11B412: eventHandler::doState() (eventHandler.cpp:41)
==25096== by 0x124575: main (main.cpp:83)
Ive tracked it to the line:
while( int( eventList[0]->cmdsToExec.size() ) > 0 ) {
Im not trying to solve this specific problem, its more about how to properly delete and unallocate a dynamic pointer from a global vector of dynamic pointers. That being said there are no objects (and I want to keep it that way). Will I need a struct deconstructor (no pun intended)? Also I dont believe cmdList vector ever has a memory leak according to this error message, also as Im clearing it all at once.
My thoughts on fixing this are to place both global vectors into my main() function and pass them into the program from there. I thought it would be unnecessary to do this and would slow the program down. Thinking now, I guess it wouldn't.
My guess is that this is a problem related to the order of destruction of static/global objects.
C++ guarantees that for a given translation unit (i.e., a cpp source file) then statics/global objects get created in the order that they are defined, and they are destroyed in the reverse order.
C++ gives no guarantee between different translation units.
My recommendations are:
Avoid statics/globals. Move them to be class members if possible.
If you have any dependencies between statics/globals then put them all in the same source file so that you have control over the order of their creation and destruction.

Vector contains data but reports length is 0, can be accessed by some functions

I've written a wrapper for a camera library in Rust that commands and operates a camera, and also saves an image to file using bindgen. Once I command an exposure to start (basically telling the camera to take an image), I can grab the image using a function of the form:
pub fn GetQHYCCDSingleFrame(
handle: *mut qhyccd_handle,
w: *mut u32,
...,
imgdata: &mut [u8],) -> u32 //(u32 is a retval)
In C++, this function was:
uint32_t STDCALL GetQHYCCDSingleFrame(qhyccd_handle: *handle, ..., uint8_t *imgdata)
In C++, I could pass in a buffer of the form imgdata = new unsigned char[length_buffer] and the function would fill the buffer with image data from the camera.
In Rust, similarly, I can pass in a buffer in the form of a Vec: let mut buffer: Vec<u8> = Vec::with_capacity(length_buffer).
Currently, the way I have structured the code is that there is a main struct, with settings such as the width and height of image, the camera handle, and others, including the image buffer. The struct has been initialized as a mut as:
let mut main_settings = MainSettings {
width: 9600,
...,
buffer: Vec::with_capacity(length_buffer),
}
There is a separate function I wrote that takes the main struct as a parameter and calls the GetQHYCCDSingleFrame function:
fn grab_image(main_settings: &mut MainSettings) {
let retval = unsafe { GetQHYCCDSingleFrame(main_settings.cam_handle, ..., &mut main_settings.image_buffer) };
}
Immediately after calling this function, if I check the length and capacity of main_settings.image_buffer:
println!("Elements in buffer are {}, capacity of buffer is {}.", main_settings.image_buffer.len(), main_settings.image_buffer.capacity());
I get 0 for length, and the buffer_length as the capacity. Similarly, printing any index such as main_settings.image_buffer[0] or 1 leads to a panic exit saying len is 0.
This would make me think that the GetQHYCCDSingleFrame code is not working properly, however, when I save the image_buffer to file using fitsio and hdu.write_region (fitsio docs linked here), I use:
let ranges = [&(x_start..(x_start + roi_width)), &(y_start..(y_start+roi_height))];
hdu.write_region(&mut fits_file, &ranges, &main_settings.image_buffer).expect("Could not write to fits file");
This saves an actual image to file with the right size and is a perfectly fine image (exactly what it would look if I took using the C++ program). However, when I try to print the buffer, for some reason is empty, yet the hdu.write_region code is able to access data somehow.
Currently, my (not good) workaround is to create another vector that reads data from the saved file and saves to a buffer, which then has the right number of elements:
main_settings.new_buffer = hdu.read_region(&mut fits_file, &ranges).expect("Couldn't read fits file");
Why can I not access the original buffer at all, and why does it report length 0, when the hdu.write_region function can access data from somewhere? And where exactly is it accessing the data from, and how can correctly I access it as well? I am bit new to borrowing and referencing, so I believe I might be doing something wrong in borrowing/referencing the buffer, or is it something else?
Sorry for the long story, but the details would probably be important for everything here. Thanks!
Well, first of all, you need to know that Vec<u8> and &mut [u8] are not quite the same as C or C++'s uint8_t *. The main difference is that Vec<u8> and &mut [u8] have the size of the array or slice saved within themselves, while uint8_t * doesn't. The Rust equivalent to C/C++ pointers are raw pointers, like *mut [u8]. Raw pointers are safe to build, but requires unsafe to be used. However, even tho they are different types, a smart pointer as &mut [u8] can be casted to a raw pointer without issue AFAIK.
Secondly, the capacity of a Vec is different of its size. Indeed, to have good performances, a Vec allocates more memory than you use, to avoid reallocating on each new element added into vector. The length however is the size of the used part. In your case, you ask the Vec to allocate a heap space of length length_buffer, but you don't tell them to consider any of the allocated space to be used, so the initial length is 0. Since C++ doesn't know about Vec and only use a raw pointer, it can't change the length written inside the Vec, that stays at 0. Thus the panicking.
To resolve it, I see multiple solutions:
Changing the Vec::with_capacity(length_buffer) into vec![0; length_buffer], explicilty asking to have a length of length_buffer from the start
Using unsafe code to explicitly set the length of the Vec without touching what is inside (using Vec::from_raw_parts). This might be faster than the first solution, but I'm not sure.
Using a Box<[u8; length_buffer]>, which is like a Vec but without reallocation and with the length that is the capacity
If your length_buffer is constant at compile time, using a [u8; length_buffer] would be much more efficient as no allocation is needed, but it comes with downsides, as you probably know

How to explain this strange phenomenon about pointer of slice in Golang? [duplicate]

Okay it's hard to describe it in words but let's say I have a map that stores int pointers, and want to store the result of an operation as another key in my hash:
m := make(map[string]*int)
m["d"] = &(*m["x"] + *m["y"])
This doesn't work and gives me the error: cannot take the address of *m["x"] & *m["y"]
Thoughts?
A pointer is a memory address. For example a variable has an address in memory.
The result of an operation like 3 + 4 does not have an address because there is no specific memory allocated for it. The result may just live in processor registers.
You have to allocate memory whose address you can put into the map. The easiest and most straightforward is to create a local variable for it.
See this example:
x, y := 1, 2
m := map[string]*int{"x": &x, "y": &y}
d := *m["x"] + *m["y"]
m["d"] = &d
fmt.Println(m["d"], *m["d"])
Output (try it on the Go Playground):
0x10438300 3
Note: If the code above is in a function, the address of the local variable (d) that we just put into the map will continue to live even if we return from the function (that is if the map is returned or created outside - e.g. a global variable). In Go it is perfectly safe to take and return the address of a local variable. The compiler will analyze the code and if the address (pointer) escapes the function, it will automatically be allocated on the heap (and not on the stack). For details see FAQ: How do I know whether a variable is allocated on the heap or the stack?
Note #2: There are other ways to create a pointer to a value (as detailed in this answer: How do I do a literal *int64 in Go?), but they are just "tricks" and are not nicer or more efficient. Using a local variable is the cleanest and recommended way.
For example this also works without creating a local variable, but it's obviously not intuitive at all:
m["d"] = &[]int{*m["x"] + *m["y"]}[0]
Output is the same. Try it on the Go Playground.
The result of the addition is placed somewhere transient (on the stack) and it would therefore not be safe to take its address. You should be able to work around this by explicitly allocating an int on the heap to hold your result:
result := make(int)
*result = *m["x"] + *m["y"]
m["d"] = result
In Go, you can not take the reference of a literal value (formally known as an r-value). Try the following:
package main
import "fmt"
func main() {
x := 3;
y := 2;
m := make(map[string]*int)
m["x"] = &x
m["y"] = &y
f := *m["x"] + *m["y"]
m["d"] = &f
fmt.Printf("Result: %d\n",*m["d"])
}
Have a look at this tutorial.

Is it safe to pass pointer of a local variable to a channel in Golang?

I have a code block that queries AD and retrive the results and write to a channel.
func GetFromAD(connect *ldap.Conn, ADBaseDN, ADFilter string, ADAttribute []string, ADPage uint32) *[]ADElement {
searchRequest := ldap.NewSearchRequest(ADBaseDN, ldap.ScopeWholeSubtree, ldap.NeverDerefAliases, 0, 0, false, ADFilter, ADAttribute, nil)
sr, err := connect.SearchWithPaging(searchRequest, ADPage)
CheckForError(err)
fmt.Println(len(sr.Entries))
ADElements := []ADElement{}
for _, entry := range sr.Entries{
NewADEntity := new(ADElement) //struct
NewADEntity.DN = entry.DN
for _, attrib := range entry.Attributes {
NewADEntity.attributes = append(NewADEntity.attributes, keyvalue{attrib.Name: attrib.Values})
}
ADElements = append(ADElements, *NewADEntity)
}
return &ADElements
}
The above function returns a pointer to []ADElements.
And in my initialrun function, I call this function like
ADElements := GetFromAD(connectAD, ADBaseDN, ADFilter, ADAttribute, uint32(ADPage))
fmt.Println(reflect.TypeOf(ADElements))
ADElementsChan <- ADElements
And the output says
*[]somemodules.ADElement
as the output of reflect.TypeOf.
My doubt here is,
since ADElements := []ADElement{} defined in GetFromAD() is a local variable, it must be allocated in the stack, and when GetFromAD() exits, contents of the stack must be destroyed, and further references to GetFromAD() must be pointing to invalid memory references, whereas I still am getting the exact number of elements returned by GetFromAD() without any segfault. How is this working? Is it safe to do it this way?
Yes, it is safe because Go compiler performs escape analysis and allocates such variables on heap.
Check out FAQ - How do I know whether a variable is allocated on the heap or the stack?
The storage location does have an effect on writing efficient programs. When possible, the Go compilers will allocate variables that are local to a function in that function's stack frame. However, if the compiler cannot prove that the variable is not referenced after the function returns, then the compiler must allocate the variable on the garbage-collected heap to avoid dangling pointer errors. Also, if a local variable is very large, it might make more sense to store it on the heap rather than the stack.
Define "safe"...
You will not end up freeing the memory of ADElements, since there's at least one live reference to it.
In this case, you should be completely safe, since you're only passing the slice once and then you seem to not modify it, but in the general case it might be better to pass it element-by-element across a chan ADElement, to avoid multiple unsynchronized accesses to the slice (or, more specifically, the array backing the slice).
This also holds for maps, where you can get curious problems if you pass a map over a channel, then continue to access it.

Resources