My erroneous code snippet and compiler error info:
// code snippet 1:
0 fn main() {
1 let mut x: Box<i32> = Box::new(4);
2 let r: &Box<i32> = &x;
3 *x = 8;
4 println!("{}", r);
5 }
// compiler error info:
error[E0506]: cannot assign to `*x` because it is borrowed
--> src/main.rs:3:4
|
2 | let r = &x;
| -- borrow of `*x` occurs here
3 | *x = 8;
| ^^^^^^ assignment to borrowed `*x` occurs here
4 | println!("{}", r);
| - borrow later used here
For more information about this error, try `rustc --explain E0506`.
The following code won't compile, which makes quite senses to me cause we cannot invalidate the reference r .
// code snippet 2:
0 fn main() {
1 let mut x: i32 = 0;
2 let r: &i32 = &x;
3 x = 1;
4 println!("{}", r);
5 }
But the compiler error info of code snippet1 doesn't make too much sense to me.
x is a pointer on the stack pointing to a heap memory segment whose contents is 4 , reference r only borrows x (the pointer not the heap memory segment) , and in line 3 *x = 8; , what we did here is to alter the memory on the heap (not the pointer on the stack) . Change happens on the heap , while reference is only relevant to the stack, they do not interrelate.
This question is kind of picking a quarrel, but I do not mean to argue for the sake of argument.
If you found my question irregular, feel free to point it out :)
Change happens on the heap , while reference is only relevant to the stack, they do not interrelate.
That does not matter, because the type system doesn't work with that "depth" of information.
As far as it's concerned, borrowing x is borrowing the entirety of x up to any depth, and so any change anywhere inside x is forbidden.
For type checking purposes, this is no different than if x were a Box<Vec<_>>, and r were be actively used for iteration, leading any update to the inner vector to possibly invalidate the iterator.
(also type-wise *x = 8 does require first taking a unique reference to the box itself, before "upgrading" it to a unique reference to the box' content, as you can see from the trait implementation)
Rust's entire borrowing model enforces one simple requirement: the contents of a memory location can only be mutated if there is only one pointer through which that location can be accessed.
In your case, the heap location that you're trying to mutate can be accessed both through x and through r—and therefore mutation is denied.
This model enables the compiler to perform aggressive optimisations that permit, for example, the storage of values reachable through either alias in registers and/or caches without needing to fetch again from memory when the value is read.
The semantics of * is determined by two traits:
pub trait Deref {
type Target: ?Sized;
fn deref(&self) -> &Self::Target;
}
or
pub trait DerefMut: Deref {
fn deref_mut(&mut self) -> &mut Self::Target;
}
In your case, when you write *x = 8 Rust compiler expands the expression into the call
DerefMut::deref_mut(&mut x), because Box<T> implements Deref<Target=T> and DerefMut. That is why in the line *x = 8 mutable borrowing of x is performed, and by orphan rule it can't be done, because we've already borrowed x in let r: &Box<i32> = &x;.
I just found a great diagram from Programming Rust (Version 2), which really answers my question quite well:
In the case of my question, when x is shared-referenced by r, everything in the ownership tree of x (the stack pointer and the heap memory segment) becomes read-only.
I knew that the Stack Overflow community does not like pictures, but this diagram is really great and may help someone who will find this question in the future:)
Related
I am a bit confused about how to transfer ownership without the overhead of actual data copy.
I have the following code. I am referring to underlying data copy by OS as memcopy.
fn main() {
let v1 = Vec::from([1; 1024]);
take_ownership_but_memcopies(v1);
let v2 = Vec::from([2; 1024]);
dont_memecopy_but_dont_take_ownership(&v2);
let v3 = Vec::from([3; 1024]);
take_ownership_dont_memcopy(???);
}
// Moves but memcopies all elements
fn take_ownership_but_memcopies(my_vec1: Vec<i32>) {
println!("{:?}", my_vec1);
}
// Doesn't memcopy but doesn't take ownership
fn dont_memecopy_but_dont_take_ownership(my_vec2: &Vec<i32>) {
println!("{:?}", my_vec2);
}
// Take ownership without the overhead of memcopy
fn take_ownership_dont_memcopy(myvec3: ???) {
println!("{:?}", my_vec3);
}
As i understand, if i use reference like v2, i don't get the ownership. If i use it like v1, there could be a memcopy.
How should i need to transfer v3 to guarantee that there is no underlying memcopy by OS?
Your understanding of what happens when you move a Vec is incorrect - it does not copy every element within the Vec!
To understand why, we need to take a step back and look at how a Vec is represented internally:
// This is slightly simplified, look at the source for more details!
struct Vec<T> {
pointer: *mut T, // pointer to the data (on the heap)
capacity: usize, // the current capacity of the Vec
len: usize, // the current number of elements in the Vec
}
While the Vec conceptually 'owns' the elements, they are not stored within the Vec struct - it only holds a pointer to that data. So when you move a Vec, it is only the pointer (plus the capacity and length) that gets copied.
If you are attempting to avoid copying altogether, as opposed to avoiding copying the contents of the Vec, that isn't really possible - in the semantics of the compiler, a move is a copy (just one that prevents you from using the old data afterwards). However, the compiler can and will optimize trivial copies into something more efficient.
How should i need to transfer v3 to guarantee that there is no underlying memcopy by OS?
You can't. Because that's Rust's semantics.
However a Vec is just 3 words on the stack, that's all which gets "memcopy"d, which is intrinsic, it's not like you're going to get a memcpy function call in there or duplicate the entire vector. And that's assuming the function call does not get inlined, and the compiler does not decide to pass in object as a reference anyway. It could also pass all 3 words through registers, at which point there's nothing to memcpy.
Though it's not entirely clear why you care either way, if you only want to read from the collection your function should be
// Take ownership without the overhead of memcopy
fn take_ownership_dont_memcopy(myvec3: &[i32]) {
println!("{:?}", my_vec3);
}
that is the most efficient and flexible signature: it's just two words, there's a single pointer (unlike &Vec), and it allows for non-Vec sources.
I've written a wrapper for a camera library in Rust that commands and operates a camera, and also saves an image to file using bindgen. Once I command an exposure to start (basically telling the camera to take an image), I can grab the image using a function of the form:
pub fn GetQHYCCDSingleFrame(
handle: *mut qhyccd_handle,
w: *mut u32,
...,
imgdata: &mut [u8],) -> u32 //(u32 is a retval)
In C++, this function was:
uint32_t STDCALL GetQHYCCDSingleFrame(qhyccd_handle: *handle, ..., uint8_t *imgdata)
In C++, I could pass in a buffer of the form imgdata = new unsigned char[length_buffer] and the function would fill the buffer with image data from the camera.
In Rust, similarly, I can pass in a buffer in the form of a Vec: let mut buffer: Vec<u8> = Vec::with_capacity(length_buffer).
Currently, the way I have structured the code is that there is a main struct, with settings such as the width and height of image, the camera handle, and others, including the image buffer. The struct has been initialized as a mut as:
let mut main_settings = MainSettings {
width: 9600,
...,
buffer: Vec::with_capacity(length_buffer),
}
There is a separate function I wrote that takes the main struct as a parameter and calls the GetQHYCCDSingleFrame function:
fn grab_image(main_settings: &mut MainSettings) {
let retval = unsafe { GetQHYCCDSingleFrame(main_settings.cam_handle, ..., &mut main_settings.image_buffer) };
}
Immediately after calling this function, if I check the length and capacity of main_settings.image_buffer:
println!("Elements in buffer are {}, capacity of buffer is {}.", main_settings.image_buffer.len(), main_settings.image_buffer.capacity());
I get 0 for length, and the buffer_length as the capacity. Similarly, printing any index such as main_settings.image_buffer[0] or 1 leads to a panic exit saying len is 0.
This would make me think that the GetQHYCCDSingleFrame code is not working properly, however, when I save the image_buffer to file using fitsio and hdu.write_region (fitsio docs linked here), I use:
let ranges = [&(x_start..(x_start + roi_width)), &(y_start..(y_start+roi_height))];
hdu.write_region(&mut fits_file, &ranges, &main_settings.image_buffer).expect("Could not write to fits file");
This saves an actual image to file with the right size and is a perfectly fine image (exactly what it would look if I took using the C++ program). However, when I try to print the buffer, for some reason is empty, yet the hdu.write_region code is able to access data somehow.
Currently, my (not good) workaround is to create another vector that reads data from the saved file and saves to a buffer, which then has the right number of elements:
main_settings.new_buffer = hdu.read_region(&mut fits_file, &ranges).expect("Couldn't read fits file");
Why can I not access the original buffer at all, and why does it report length 0, when the hdu.write_region function can access data from somewhere? And where exactly is it accessing the data from, and how can correctly I access it as well? I am bit new to borrowing and referencing, so I believe I might be doing something wrong in borrowing/referencing the buffer, or is it something else?
Sorry for the long story, but the details would probably be important for everything here. Thanks!
Well, first of all, you need to know that Vec<u8> and &mut [u8] are not quite the same as C or C++'s uint8_t *. The main difference is that Vec<u8> and &mut [u8] have the size of the array or slice saved within themselves, while uint8_t * doesn't. The Rust equivalent to C/C++ pointers are raw pointers, like *mut [u8]. Raw pointers are safe to build, but requires unsafe to be used. However, even tho they are different types, a smart pointer as &mut [u8] can be casted to a raw pointer without issue AFAIK.
Secondly, the capacity of a Vec is different of its size. Indeed, to have good performances, a Vec allocates more memory than you use, to avoid reallocating on each new element added into vector. The length however is the size of the used part. In your case, you ask the Vec to allocate a heap space of length length_buffer, but you don't tell them to consider any of the allocated space to be used, so the initial length is 0. Since C++ doesn't know about Vec and only use a raw pointer, it can't change the length written inside the Vec, that stays at 0. Thus the panicking.
To resolve it, I see multiple solutions:
Changing the Vec::with_capacity(length_buffer) into vec![0; length_buffer], explicilty asking to have a length of length_buffer from the start
Using unsafe code to explicitly set the length of the Vec without touching what is inside (using Vec::from_raw_parts). This might be faster than the first solution, but I'm not sure.
Using a Box<[u8; length_buffer]>, which is like a Vec but without reallocation and with the length that is the capacity
If your length_buffer is constant at compile time, using a [u8; length_buffer] would be much more efficient as no allocation is needed, but it comes with downsides, as you probably know
Okay it's hard to describe it in words but let's say I have a map that stores int pointers, and want to store the result of an operation as another key in my hash:
m := make(map[string]*int)
m["d"] = &(*m["x"] + *m["y"])
This doesn't work and gives me the error: cannot take the address of *m["x"] & *m["y"]
Thoughts?
A pointer is a memory address. For example a variable has an address in memory.
The result of an operation like 3 + 4 does not have an address because there is no specific memory allocated for it. The result may just live in processor registers.
You have to allocate memory whose address you can put into the map. The easiest and most straightforward is to create a local variable for it.
See this example:
x, y := 1, 2
m := map[string]*int{"x": &x, "y": &y}
d := *m["x"] + *m["y"]
m["d"] = &d
fmt.Println(m["d"], *m["d"])
Output (try it on the Go Playground):
0x10438300 3
Note: If the code above is in a function, the address of the local variable (d) that we just put into the map will continue to live even if we return from the function (that is if the map is returned or created outside - e.g. a global variable). In Go it is perfectly safe to take and return the address of a local variable. The compiler will analyze the code and if the address (pointer) escapes the function, it will automatically be allocated on the heap (and not on the stack). For details see FAQ: How do I know whether a variable is allocated on the heap or the stack?
Note #2: There are other ways to create a pointer to a value (as detailed in this answer: How do I do a literal *int64 in Go?), but they are just "tricks" and are not nicer or more efficient. Using a local variable is the cleanest and recommended way.
For example this also works without creating a local variable, but it's obviously not intuitive at all:
m["d"] = &[]int{*m["x"] + *m["y"]}[0]
Output is the same. Try it on the Go Playground.
The result of the addition is placed somewhere transient (on the stack) and it would therefore not be safe to take its address. You should be able to work around this by explicitly allocating an int on the heap to hold your result:
result := make(int)
*result = *m["x"] + *m["y"]
m["d"] = result
In Go, you can not take the reference of a literal value (formally known as an r-value). Try the following:
package main
import "fmt"
func main() {
x := 3;
y := 2;
m := make(map[string]*int)
m["x"] = &x
m["y"] = &y
f := *m["x"] + *m["y"]
m["d"] = &f
fmt.Printf("Result: %d\n",*m["d"])
}
Have a look at this tutorial.
I'm looking for WP options/model that could allow me to prove basic C memory manipulations like :
memcpy : I've tried to prove this simple code :
struct header_src{
char t1;
char t2;
char t3;
char t4;
};
struct header_dest{
short t1;
short t2;
};
/*# requires 0<=n<=UINT_MAX;
# requires \valid(dest);
# requires \valid_read(src);
# assigns (dest)[0..n-1] \from (src)[0..n-1];
# assigns \result \from dest;
# ensures dest[0..n] == src[0..n];
# ensures \result == dest;
*/
void* Frama_C_memcpy(char *dest, const char *src, uint32_t n);
int main(void)
{
struct header_src p_header_src;
struct header_dest p_header_dest;
p_header_src.t1 = 'e';
p_header_src.t2 = 'b';
p_header_src.t3 = 'c';
p_header_src.t4 = 'd';
p_header_dest.t1 = 0x0000;
p_header_dest.t2 = 0x0000;
//# assert \valid(&p_header_dest);
Frama_C_memcpy((char*)&p_header_dest, (char*)&p_header_src, sizeof(struct header_src));
//# assert p_header_dest.t1 == 0x6265;
//# assert p_header_dest.t2 == 0x6463;
}
but the two last assert weren't verified by WP (with default prover Alt-Ergo). It can be proved thanks to Value analysis, but I mostly want to be able to prove the code not using abstract interpretation.
Cast pointer to int : Since I'm programming embedded code, I want to be able to specify something like:
#define MEMORY_ADDR 0x08000000
#define SOME_SIZE 10
struct some_struct {
uint8_t field1[SOME_SIZE];
uint32_t field2[SOME_SIZE];
}
// [...]
// some function body {
struct some_struct *p = (some_struct*)MEMORY_ADDR;
if(p == NULL) {
// Handle error
} else {
// Do something
}
// } body end
I've looked a little bit at WP's documentation and it seems that the version of frama-c that I use (Magnesium-20151002) has several memory model (Hoare, Typed , +cast, +ref, ...) but none of the given example were proved with any of the model above. It is explicitly said in the documentation that Typed model does not handle pointer-to-int casts. I've a lot of trouble to understand what's really going on under the hood with each wp-model. It would really help me if I was able to verify at least post-conditions of the memcpy function. Plus, I have seen this issue about void pointer that apparently are not very well handled by WP at least in the Magnesium version. I didn't tried another version of frama-c yet, but I think that newer version handle void pointer in a better way.
Thank you very much in advance for your suggestions !
memcpy
Reasoning about the result of memcpy (or Frama_C_memcpy) is out of range of the current WP plugin. The only memory model that would work in your case is Bytes (page 13 of the manual for Chlorine), but it is not implemented.
Independently, please note that your postcondition from Frama_C_memcpy is not what you want here. You are asserting the equality of the sets dest[0..n] and src[0..n]. First, you should stop at n-1. Second, and more importantly, this is far too weak, and is in fact not sufficient to prove the two assertions in the caller. What you want is a quantification on all bytes. See e.g. the predicate memcmp in Frama-C's stdlib, or the variant \forall int i; 0 <= i < n -> dest[i] == src[i];
By the way, this postcondition holds only if dest and src are properly separated, which your function does not require. Otherwise, you should write dest[i] == \at (src[i], Pre). And your requires are also too weak for another reason, as you only require the first character to be valid, not the n first ones.
Cast pointer to int
Basically, all current models implemented in WP are unable to reason on codes in which the memory is accessed with multiple incompatible types (through the use of unions or pointer casts). In some cases, such as Typed, the cast is detected syntactically, and a warning is issued to warn that the code cannot be analyzed. The Typed+Cast model is a variant of Typed in which some casts are accepted. The analysis is correct only if the pointer is re-cast into its original type before being used. The idea is to allow using some libc functions that require a cast to void*, but not much more.
Your example is again a bit different, because it is likely that MEMORY_ADDR is always addressed with type some_stuct. Would it be acceptable to change the code slightly, and change your function as taking a pointer to this type? This way, you would "hide" the cast to MEMORY_ADDR inside a function that would remain unproven.
I tried this example in the latest version of Frama-C (of course the format is modified a little bit).
for the memcpy case
Assertion 2 fails but assertion 3 is successfully proved (basically because the failure of assertion 2 leads to a False assumption, which proves everything).
So in fact both assertion cannot be proved, same as your problem.
This conclusion is sound because the memory models used in the wp plugin (as far as I know) has no assumption on the relation between fields in a struct, i.e. in header_src the first two fields are 8 bit chars, but they may not be nestedly organized in the physical memory like char[2]. Instead, there may be paddings between them (refer to wiki for detailed description). So when you try to copy bits in such a struct to another struct, Frama-C becomes completely confused and has no idea what you are doing.
As far as I am concerned, Frama-C does not support any approach to precisely control the memory layout, e.g. gcc's PACKED which forces the compiler to remove paddings.
I am facing the same problem, and the (not elegant at all) solution is, use arrays instead. Arrays are always nested, so if you try to copy a char[4] to a short[2], I think the assertion can be proved.
for the Cast pointer to int case
With memory model Typed+cast, the current version I am using (Chlorine-20180501) supports casting between pointers and uint64_t. You may want to try this version.
Moreover, it is strongly suggested to call Z3 and CVC4 through why3, whose performance is certainly better than Alt-Ergo.
I'm a bit confused when I see code such as:
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4 // correct...
bigBox.SmallBox.AnyMagicItem = true // also correct
Why, or when, would I want to do bigBox := &BigBox{} instead of bigBox := BigBox{} ? Is it more efficient in some way?
Code sample was taken from here.
Sample no.2:
package main
import "fmt"
type Ints struct {
x int
y int
}
func build_struct() Ints {
return Ints{0,0}
}
func build_pstruct() *Ints {
return &Ints{0,0}
}
func main() {
fmt.Println(build_struct())
fmt.Println(build_pstruct())
}
Sample no. 3: ( why would I go with &BigBox in this example, and not with BigBox as a struct directly ? )
func main() {
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4
fmt.Println(bigBox.BubbleGumsCount)
}
Is there ever a reason to call build_pstruct instead of the the build_struct variant? Isn't that why we have the GC?
I figured out one motivation for this kind of code: avoidance of "struct copying by accident".
If you use a struct variable to hold the newly created struct:
bigBox := BigBox{}
you may copy the struct by accident like this
myBox := bigBox // Where you just want a refence of bigBox.
myBox.BubbleGumsCount = 4
or like this
changeBoxColorToRed(bigBox)
where changeBoxColorToRed is
// It makes a copy of entire struct as parameter.
func changeBoxColorToRed(box bigBox){
// !!!! This function is buggy. It won't work as expected !!!
// Please see the fix at the end.
box.Color=red
}
But if you use a struct pointer:
bigBox := &BigBox{}
there will be no copying in
myBox := bigBox
and
changeBoxColorToRed(bigBox)
will fail to compile, giving you a chance to rethink the design of changeBoxColorToRed. The fix is obvious:
func changeBoxColorToRed(box *bigBox){
box.Color=red
}
The new version of changeBoxColorToRed does not copy the entire struct and works correctly.
bb := &BigBox{} creates a struct, but sets the variable to be a pointer to it. It's the same as bb := new(BigBox). On the other hand, bb := BigBox{} makes bb a variable of type BigBox directly. If you want a pointer (because perhaps because you're going to use the data via a pointer), then it's better to make bb a pointer, otherwise you're going to be writing &bb a lot. If you're going to use the data as a struct directly, then you want bb to be a struct, otherwise you're going to be dereferencing with *bb.
It's off the point of the question, but it's usually better to create data in one go, rather than incrementally by creating the object and subsequently updating it.
bb := &BigBox{
BubbleGumsCount: 4,
SmallBox: {
AnyMagicItem: true,
},
}
The & takes an address of something. So it means "I want a pointer to" rather than "I want an instance of". The size of a variable containing a value depends on the size of the value, which could be large or small. The size of a variable containing a pointer is 8 bytes.
Here are examples and their meanings:
bigBox0 := &BigBox{} // bigBox0 is a pointer to an instance of BigBox{}
bigBox1 := BigBox{} // bigBox1 contains an instance of BigBox{}
bigBox2 := bigBox // bigBox2 is a copy of bigBox
bigBox3 := &bigBox // bigBox3 is a pointer to bigBox
bigBox4 := *bigBox3 // bigBox4 is a copy of bigBox, dereferenced from bigBox3 (a pointer)
Why would you want a pointer?
To prevent copying a large object when passing it as an argument to a function.
You want to modify the value by passing it as an argument.
To keep a slice, backed by an array, small. [10]BigBox would take up "the size of BigBox" * 10 bytes. [10]*BigBox would take up 8 bytes * 10. A slice when resized has to create a larger array when it reaches its capacity. This means the memory of the old array has to be copied to the new array.
Why do you not what to use a pointer?
If an object is small, it's better just to make a copy. Especially if it's <= 8 bytes.
Using pointers can create garbage. This garbage has to be collected by the garbage collector. The garbage collector is a mark-and-sweep stop-the-world implementation. This means that it has to freeze your application to collect the garbage. The more garbage it has to collect, the longer that pause is. This individual, for example. experienced a pause up to 10 seconds.
Copying an object uses the stack rather than the heap. The stack is usually always faster than the heap. You really don't have to think about stack vs heap in Go as it decides what should go where, but you shouldn't ignore it either. It really depends on the compiler implementation, but pointers can result in memory going on the heap, resulting in the need for garbage collection.
Direct memory access is faster. If you have a slice []BigBox and it doesn't change size it can be faster to access. []BigBox is faster to read, whereas []*BigBox is faster to resize.
My general advice is use pointers sparingly. Unless you're dealing with a very large object that needs to be passed around, it's often better to pass around a copy on the stack. Reducing garbage is a big deal. The garbage collector will get better, but you're better off by keeping it as low as possible.
As always test your application and profile it.
The difference is between creating a reference object (with the ampersand) vs. a value object (without the ampersand).
There's a nice explanation of the general concept of value vs. reference type passing here... What's the difference between passing by reference vs. passing by value?
There is some discussion of these concepts with regards to Go here... http://www.goinggo.net/2013/07/understanding-pointers-and-memory.html
In general there is no difference between a &BigBox{} and BigBox{}. The Go compiler is free to do whatever it likes as long as the semantics are correct.
func StructToStruct() {
s := Foo{}
StructFunction(&s)
}
func PointerToStruct() {
p := &Foo{}
StructFunction(p)
}
func StructToPointer() {
s := Foo{}
PointerFunction(&s)
}
func PointerToPointer() {
p := &Foo{}
PointerFunction(p)
}
//passed as a pointer, but used as struct
func StructFunction(f *Foo) {
fmt.Println(*f)
}
func PointerFunction(f *Foo) {
fmt.Println(f)
}
Summary of the assembly:
StructToStruct: 13 lines, no allocation
PointerToStruct: 16 lines, no allocation
StructToPointer: 20 lines, heap allocated
PointerToPointer: 12 lines, heap allocated
With a perfect compiler the *ToStruct functions would be the identical as would the *ToPointer functions. Go's escape analysis is good enough to tell if a pointer escapes even across module boundries. Which ever way is most efficient is the way the compiler will do it.
If you're really into micro-optimization note that Go is most efficient when the syntax lines up with the semantics (struct used as a struct, pointer used as a pointer). Or you can just forget about it and declare the variable the way it will be used and you will be right most of the time.
Note: if Foo is really big PointerToStruct will heap allocate it. The spec threatens to that even StructToStruct is allowed to do this but I couldn't make it happen. The lesson here is that the compiler will do whatever it wants. Just as the details of the registers is shielded from the code, so is the state of the heap/stack. Don't change your code because you think you know how the compiler is going to use the heap.