I'm trying to understand Rust pointer types and their relation to mutability. Specifically, the ways of declaring a variable which holds the pointer and is itself mutable -- i.e. can be pointed to some other memory, and declaring that the data itself is mutable -- i.e. can be changed through the value of the pointer variable.
This is how I understand plain references work:
let mut a = &5; // a is a mutable pointer to immutable data
let b = &mut 5; // b is an immutable pointer to mutable data
So a can be changed to point to something else, while b can't. However, the data to which b points to can be changed through b, while it can't through a. Do I understand this correctly?
For the second part of the question -- why does Box::new seem to behave differently? This is my current understanding:
let mut a = Box::new(5); // a is a mutable pointer to mutable data
let c = Box::new(7); // c is an immutable pointer to immutable data
new should return a pointer to some heap-allocated data, but the data it points to seems to inherit mutability from the variable which holds the pointer, unlike in the example with references where these two states of mutability are independent! Is that how Box::new is supposed to work? If so, how can I create a pointer value to mutable data on the heap that is stored in an immutable variable?
First, you do understand how references behave correctly. mut a is a mutable variable (or, more correctly, a mutable binding), while &mut 5 is a mutable reference pointing to a mutable piece of data (which is implicitly allocated on the stack for you).
Second, Box behaves differently from references because it is fundamentally different from references. Another name for Box is owning/owned pointer. Each Box owns the data it holds, and it does so uniquely, therefore mutability of this data is inherited from mutability of the box itself. So yes, this is exactly how Box should work.
Another, probably more practical, way to understand it is to consider Box<T> exactly equivalent to just T, except of fixed size and allocation method. In other words, Box provides value semantics: it is moved around just like any value and its mutability depends on the binding it is stored in.
There are several ways to create a pointer to a mutable piece of data on the heap while keeping the pointer immutable. The most generic one is RefCell:
use std::cell::RefCell;
struct X { id: u32 }
let x: Box<RefCell<X>> = Box::new(RefCell::new(X { id: 0 }));
x.borrow_mut().id = 1;
Alternatively, you can use Cell (for Copy types):
let x: Box<Cell<u32>> = Box::new(Cell::new(0));
x.set(1);
Note that the above examples are using so-called "internal mutability" which should better be avoided unless you do need it for something. If you want to create a Box with mutable interior only to keep mutability properties, you really shouldn't. It isn't idiomatic and will only result in a syntactic and semantic burden.
You can find a lot of useful information here:
Ownership
References and borrowing
Mutability
std::cell - internal mutability types
In fact, if you have a question about such fundamental things as mutability, it is probably already explained in the book :)
Related
Looking through the documentation for std::cell::Cell, I don't see anywhere how I can retrieve a non-mutable reference to inner data. There is only the get_mut method: https://doc.rust-lang.org/std/cell/struct.Cell.html#method.get_mut
I don't want to use this function because I want to have &self instead of &self mut.
I found an alternative solution of taking the raw pointer:
use std::cell::Cell;
struct DbObject {
key: Cell<String>,
data: String
}
impl DbObject {
pub fn new(data: String) -> Self {
Self {
key: Cell::new("some_uuid".into()),
data,
}
}
pub fn assert_key(&self) -> &str {
// setup key in the future if is empty...
let key = self.key.as_ptr();
unsafe {
let inner = key.as_ref().unwrap();
return inner;
}
}
}
fn main() {
let obj = DbObject::new("some data...".into());
let key = obj.assert_key();
println!("Key: {}", key);
}
Is there any way to do this without using unsafe? If not, perhaps RefCell will be more practical here?
Thank you for help!
First of, if you have a &mut T, you can trivially get a &T out of it. So you can use get_mut to get &T.
But to get a &mut T from a Cell<T> you need that cell to be mutable, as get_mut takes a &mut self parameter. And this is by design the only way to get a reference to the inner object of a cell.
By requiring the use of a &mut self method to get a reference out of a cell, you make it possible to check for exclusive access at compile time with the borrow checker. Remember that a cell enables interior mutability, and has a method set(&self, val: T), that is, a method that can modify the value of a non-mut binding! If there was a get(&self) -> &T method, the borrow checker could not ensure that you do not hold a reference to the inner object while setting the object, which would not be safe.
TL;DR: By design, you can't get a &T out of a non-mut Cell<T>. Use get_mut (which requires a mut cell), or set/replace (which work on a non-mut cell). If this is not acceptable, then consider using RefCell, which can get you a &T out of a non-mut instance, at some runtime cost.
In addition to to #mcarton answer, in order to keep interior mutability sound, that is, disallow mutable reference to coexist with other references, we have three different ways:
Using unsafe with the possibility of Undefined Behavior. This is what UnsafeCell does.
Have some runtime checks, involving runtime overhead. This is the approach RefCell, RwLock and Mutex use.
Restrict the operations that can be done with the abstraction. This is what Cell, Atomic* and (the unstable) OnceCell (and thus Lazy that uses it) does (note that the thread-safe types also have runtime overhead because they need to provide some sort of locking). Each provides a different set of allowed operations:
Cell and Atomic* do not let you to get a reference to the contained value, and only replace it as whole (basically, get() and set, though convenience methods are provided on top of these, such as swap()). Projection (cell-of-slice to slice-of-cells) is also available for Cell (field projection is possible, but not provided as part of std).
OnceCell allows you to assign only once and only then take shared reference, guaranteeing that when you assign you have no references and while you have shared references you cannot assign anymore.
Thus, when you need to be able to take a reference into the content, you cannot choose Cell as it was not designed for that - the obvious choice is RefCell, indeed.
I just started learning go, while going through slice tricks, couple of points are very confusing. can any one help me to clarify.
To cut elements in slice its given
Approach 1:
a = append(a[:i], a[j:]...)
but there is a note given that it may cause to memory leaks if pointers are used and recommended way is
Approach 2:
copy(a[i:], a[j:])
for k, n := len(a)-j+i, len(a); k < n; k++ {
a[k] = nil // or the zero value of T
}
a = a[:len(a)-j+i]
Can any one help me understand how memory leaks happen.
I understood sub slice will be backed by the main array. My thought is irrespective of pointer or not we have to follow approach 2 always.
update after #icza and #Volker answer..
Lets say you have a struct
type Books struct {
title string
author string
}
var Book1 Books
var Book2 Books
/* book 1 specification */
Book1.title = "Go Programming"
Book1.author = "Mahesh Kumar"
Book2.title = "Go Programming"
Book2.author = "Mahesh Kumar"
var bkSlice = []Books{Book1, Book2}
var bkprtSlice = []*Books{&Book1, &Book2}
now doing
bkSlice = bkSlice[:1]
bkSlice still holds the Book2 in backing array which is still in memory and is not required to be.
so do we need to do
bkSlice[1] = Books{}
so that it will be GCed. I understood pointers have to be nil-ed as the slice will hold unnecessary references to the objects outside backing array.
Simplest can be demonstrated by a simple slice expression.
Let's start with a slice of *int pointers:
s := []*int{new(int), new(int)}
This slice has a backing array with a length of 2, and it contains 2 non-nil pointers, pointing to allocated integers (outside of the backing array).
Now if we reslice this slice:
s = s[:1]
Length will become 1. The backing array (holding 2 pointers) is not touched, it sill holds 2 valid pointers. Even though we don't use the 2nd pointer now, since it is in memory (it is the backing array), the pointed object (which is a memory space for storing an int value) cannot be freed by the garbage collector.
The same thing happens if you "cut" multiple elements from the middle. If the original slice (and its backing array) was filled with non-nil pointers, and if you don't zero them (with nil), they will be kept in memory.
Why isn't this an issue with non-pointers?
Actually, this is an issue with all pointer and "header" types (like slices and strings), not just pointers.
If you would have a slice of type []int instead of []*int, then slicing it will just "hide" elements that are of int type which must stay in memory as part of the backing array regardless of if there's a slice that contains it or not. The elements are not references to objects stored outside of the array, while pointers refer to objects being outside of the array.
If the slice contains pointers and you nil them before the slicing operation, if there are no other references to the pointed objects (if the array was the only one holding the pointers), they can be freed, they will not be kept due to still having a slice (and thus the backing array).
Update:
When you have a slice of structs:
var bkSlice = []Books{Book1, Book2}
If you slice it like:
bkSlice = bkSlice[:1]
Book2 will become unreachabe via bkSlice, but still will be in memory (as part of the backing array).
You can't nil it because nil is not a valid value for structs. You can however assign its zero value to it like this:
bkSlice[1] = Book{}
bkSlice = bkSlice[:1]
Note that a Books struct value will still be in memory, being the second element of the backing array, but that struct will be a zero value, and thus will not hold string references, thus the original book author and title strings can be garbage collected (if no one else references them; more precisely the byte slice referred from the string header).
The general rule is "recursive": You only need to zero elements that refer to memory located outside of the backing array. So if you have a slice of structs that only have e.g. int fields, you do not need to zero it, in fact it's just unnecessary extra work. If the struct has fields that are pointers, or slices, or e.g. other struct type that have pointers or slices etc., then you should zero it in order to remove the reference to the memory outside of the backing array.
When passing two elements from the same vector to a function, the borrow checker will not allow one of the elements to be mutable.
struct Point {
x: i32,
y: i32,
}
fn main() {
let mut vec: Vec<Point> = Vec::new();
foo(&mut vec[0], &vec[1]);
}
fn foo(pnt_1: &mut Point, pnt_2: &Point) {
}
error: cannot borrow vec as immutable because it is also borrowed as mutable
vec is never borrowed by foo though, vec[0] is borrowed and vec[0] is a Point.
How can I pass multiple elements from the same collection into a function with one or more of the elements being mutable?
How can I pass multiple elements from the same collection into a function with one or more of the elements being mutable?
The short answer is that you cannot, at least not without support from the collection itself.
Rust disallows mutable aliases - multiple names for the same thing, one of which allows mutation.
It would be far too complicated (with the current state of programming languages) to verify that (&mut vec[0], &vec[1]) does not introduce aliasing but (&mut vec[0], &vec[0]) does. Adding to the complexity is the fact that the [] operator can be overloaded, which allows creating a type such that foo[0] and foo[1] actually point at the same thing.
So, how can a collection help out? Each collection will have (or not have) a specific way of subdivision in an aliasing-safe manner.
There can be methods like slice::split_at_mut which verify that that two halves cannot overlap and thus no aliasing can occur.
Unfortunately, there's no HashMap::get_two_things(&a, &b) that I'm aware of. It would be pretty niche, but that doesn't mean it couldn't exist.
vec is never borrowed by foo though
It most certainly is. When you index a Vec, you are getting a reference to some chunk of memory inside the Vec. If the Vec were to change underneath you, such as when someone adds or removes a value, then the underlying memory may need to be reallocated, invalidating the reference. This is a prime example of why mutable aliasing is a bad thing.
I often find myself getting an error like this:
mismatched types: expected `collections::vec::Vec<u8>`, found `&[u8]` (expected struct collections::vec::Vec, found &-ptr)
As far as I know, one is mutable and one isn't but I've no idea how to go between the types, i.e. take a &[u8] and make it a Vec<u8> or vice versa.
What's the different between them? Is it the same as String and &str?
Is it the same as String and &str?
Yes. A Vec<T> is the owned variant of a &[T]. &[T] is a reference to a set of Ts laid out sequentially in memory (a.k.a. a slice). It represents a pointer to the beginning of the items and the number of items. A reference refers to something that you don't own, so the set of actions you can do with it are limited. There is a mutable variant (&mut [T]), which allows you to mutate the items in the slice. You can't change how many are in the slice though. Said another way, you can't mutate the slice itself.
take a &[u8] and make it a Vec
For this specific case:
let s: &[u8]; // Set this somewhere
Vec::from(s);
However, this has to allocate memory not on the stack, then copy each value into that memory. It's more expensive than the other way, but might be the correct thing for a given situation.
or vice versa
let v = vec![1u8, 2, 3];
let s = v.as_slice();
This is basically "free" as v still owns the data, we are just handing out a reference to it. That's why many APIs try to take slices when it makes sense.
How do I take the address of a value inside an interface?
I have an struct stored in an interface, in a list.List element:
import "container/list"
type retry struct{}
p := &el.Value.(retry)
But I get this:
cannot take the address of el.Value.(retry)
What's going on? Since the struct is stored in the interface, why can't I get a pointer to it?
To understand why this isn't possible, it is helpful to think about what an interface variable actually is. An interface value takes up two words, with the first describing the type of the contained value, and the second either (a) holding the contained value (if it fits within the word) or (b) a pointer to storage for the value (if the value does not fit within a word).
The important things to note are that (1) the contained value belongs to the interface variable, and (2) the storage for that value may be reused when a new value is assigned to the variable. Knowing that, consider the following code:
var v interface{}
v = int(42)
p := GetPointerToInterfaceValue(&v) // a pointer to an integer holding 42
v = &SomeStruct{...}
Now the storage for the integer has been reused to hold a pointer, and *p is now an integer representation of that pointer. You can see how this has the capacity to break the type system, so Go doesn't provide a way to do this (outside of using the unsafe package).
If you need a pointer to the structs you're storing in a list, then one option would be to store pointers to the structs in the list rather than struct values directly. Alternatively, you could pass *list.Element values as references to the contained structures.
A type assertion is an expression that results in two values. Taking the address in this case would be ambiguous.
p, ok := el.Value.(retry)
if ok {
// type assertion successful
// now we can take the address
q := &p
}
From the comments:
Note that this is a pointer to a copy of the value rather than a pointer to the value itself.
— James Henstridge
The solution to the problem is therefore simple; store a pointer in the interface, not a value.
Get pointer to interface value?
Is there a way, given a variable of interface type, of getting a
pointer to the value stored in the variable?
It is not possible.
Rob Pike
Interface values are not necessarily addressable. For example,
package main
import "fmt"
func main() {
var i interface{}
i = 42
// cannot take the address of i.(int)
j := &i.(int)
fmt.Println(i, j)
}
Address operators
For an operand x of type T, the address operation &x generates a
pointer of type *T to x. The operand must be addressable, that is,
either a variable, pointer indirection, or slice indexing operation;
or a field selector of an addressable struct operand; or an array
indexing operation of an addressable array. As an exception to the
addressability requirement, x may also be a composite literal.
References:
Interface types
Type assertions
Go Data Structures: Interfaces
Go Interfaces
In the first approximation: You cannot do that. Even if you could, p itself would the have to have type interface{} and would not be too helpful - you cannot directly dereference it then.
The obligatory question is: What problem are you trying to solve?
And last but not least: Interfaces define behavior not structure. Using the interface's underlying implementing type directly in general breaks the interface contract, although there might be non general legitimate cases for it. But those are already served, for a finite set of statically known types, by the type switch statement.