Why address of pointer doesn't change when modifying the string variable in Rust? - pointers

I thought rust makes another data on the heap memory when modifying the string. Therefore I expected a pointer address would change when I push a value to the string variable.
fn main() {
let mut hello = String::from("hello");
println!("{:?}", hello.as_ptr()); // 0x7fcfa7c01be0
hello.push_str(", world!");
println!("{:?}", hello.as_ptr()); // 0x7fcfa7c01be0
}
However, the result shows it's not. The address of the pointers was not changed, so I tested it with vector type.
fn main() {
let mut numbers = vec![1, 2, 3];
println!("{:?}", numbers.as_ptr()); // 0x7ffac4401be0
numbers.push(4);
println!("{:?}", numbers.as_ptr()); // 0x7ffac4401ce0
}
The pointer address of the vector variable was changed when modifying it. What is the difference between the memory of string and vector type?

Vec<T> and String may maintain extra space to avoid allocating on every push operation. This provides amortized O(1) time for push operations.
It happens to be the case that the vec! macro is guaranteed to create a vector without such extra space, while String::from(&str) does not have such a guarantee.
See https://doc.rust-lang.org/std/vec/struct.Vec.html#capacity-and-reallocation for more details.

A String is like a Vec<T>¹ in that it has both a length and a capacity. If the capacity of the current allocation is big enough to hold the new string, the underlying buffer does not need to be reallocated. The documentation for Vec<T> explains it this way:
The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector's length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.
For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur.
However, even if the capacity does change, the pointer value is still not guaranteed to move. The system allocator itself may be able to resize the allocation without moving it if there is enough unallocated space adjacent to it. That appears to be what's happening in your code. If you print the capacity along with the pointer, you can observe this behavior:
let mut hello = String::from("hello");
for _ in 0..10 {
println!("({:3}) {:?}", hello.capacity(), hello.as_ptr()); // 0x7fcfa7c01be0
hello.push_str(", world!");
}
( 5) 0x557624d8da40
( 13) 0x557624d8da40
( 26) 0x557624d8dba0
( 52) 0x557624d8dba0
( 52) 0x557624d8dba0
( 52) 0x557624d8dba0
(104) 0x557624d8dba0
(104) 0x557624d8dba0
(104) 0x557624d8dba0
(104) 0x557624d8dba0
In this example, the buffer was resized 4 times, but the contents were only moved once.
¹ Actually, a String is a newtyped Vec<u8>, which explains why they work the same.

Related

How to traverse character elements of *const char pointer in Rust?

I'm new to Rust programing and I have a bit of difficulty when this language is different from C Example, I have a C function as follows:
bool check(char* data, int size){
int i;
for(i = 0; i < size; i++){
if( data[i] != 0x00){
return false;
}
}
return true;
}
How can I convert this function to Rust? I tried it like C, but it has Errors :((
First off, I assume that you want to use as little unsafe code as possible. Otherwise there really isn't any reason to use Rust in the first place, as you forfeit all the advantages it brings you.
Depending on what data represents, there are multiple ways to transfer this to Rust.
First off: Using pointer and length as two separate arguments is not possible in Rust without unsafe. It has the same concept, though; it's called slices. A slice is exactly the same as a pointer-size combination, just that the compiler understands it and checks it for correctness at compile time.
That said, a char* in C could actually be one of four things. Each of those things map to different types in Rust:
Binary data whose deallocation is taken care of somewhere else (in Rust terms: borrowed data)
maps to &[u8], a slice. The actual content of the slice is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
Binary data that has to be deallocated within this function after using it (in Rust terms: owned data)
maps to Vec<u8>; as soon as it goes out of scope the data is deleted
actual content is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
the size of the allocation as usize. This allows for efficient push()/pop() operations. It is guaranteed that the length of the data does not exceed the size of the allocation.
A string whose deallocation is taken care of somewhere else (in Rust terms: a borrowed string)
maps to &str, a so called string slice.
This is identical to &[u8] with the additional compile time guarantee that it contains valid UTF-8 data.
A string that has to be deallocated within this function after using it (in Rust terms: an owned string)
maps to String
same as Vec<u8> with the additional compile time guarantee that it contains valid UTF-8 data.
You can create &[u8] references from Vec<u8>'s and &str references from Strings.
Now this is the point where I have to make an assumption. Because the function that you posted checks if all of the elements of data are zero, and returns false if if finds a non-zero element, I assume the content of data is binary data. And because your function does not contain a free call, I assume it is borrowed data.
With that knowledge, this is how the given function would translate to Rust:
fn check(data: &[u8]) -> bool {
for d in data {
if *d != 0x00 {
return false;
}
}
true
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
This is quite a direct translation; it's not really idiomatic to use for loops a lot in Rust. Good Rust code is mostly iterator based; iterators are most of the time zero-cost abstraction that can get compiled very efficiently.
This is how your code would look like if rewritten based on iterators:
fn check(data: &[u8]) -> bool {
data.iter().all(|el| *el == 0x00)
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
The reason this is more idiomatic is that it's a lot easier to read for someone who hasn't written it. It clearly says "return true if all elements are equal to zero". The for based code needs a second to think about to understand if its "all elements are zero", "any element is zero", "all elements are non-zero" or "any element is non-zero".
Note that both versions compile to the exact same bytecode.
Also note that, unlike the C version, the Rust borrow checker guarantees at compile time that data is valid. It's impossible in Rust (without unsafe) to produce a double free, a use-after-free, an out-of-bounds array access or any other kind of undefined behaviour that would cause memory corruption.
This is also the reason why Rust doesn't do pointers without unsafe - it needs the length of the data to check out-of-bounds errors at runtime. That means, accessing data via [] operator is a little more costly in Rust (as it does perform an out-of-bounds check every time), which is the reason why iterator based programming is a thing. Iterators can iterate over data a lot more efficient than directly accessing it via [] operators.

Misunderstanding of how the Read trait works for TcpStreams

My goal is to read some bytes from a TcpStream in order to parse the data in each message and build a struct from it.
loop {
let mut buf: Vec<u8> = Vec::new();
let len = stream.read(&mut buf)?;
if 0 == len {
//Disconnected
}
println!("read() -> {}", len);
}
Like in Python, I thought the stream.read() would block until it received some data.
So I've set up a server that calls the loop you see above for each incoming connection. I've then tried to connect to the server with netcat; netcat connects successfully to the server and blocks on the stream.read(), which is what I want; but as soon as I send some data, read() returns 0.
I've also tried doing something similar with stream.read_to_end() but it only appears to only return when the connection is closed.
How can I read from the TcpStream, message per message, knowing that each message can have a different, unknown, size ?
You're getting caught with your pants down by an underlying technicality of Vec more than by std::io::Read, although they both interact in this particular case.
The definition and documentation of Read states:
If the return value of this method is Ok(n), then it must be guaranteed that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:
The important part is bolded.
When you define a new Vec the way you did, it starts with a capacity of zero. This means that the underlying slice (that you will use as a buffer) has a length of zero. As a result, since it must be guaranteed that 0 <= n <= buf.len() and since buf.len() is zero, your read() call immediately returns with 0 bytes read.
To "fix" this, you can either assign a default set of elements to your Vec (Vec::new().resize(1024, 0)), or just use an array from the get-go (let mut buffer:[u8; 1024] = [0; 1024])

Will an array of pointers be equal to an array of chars?

I have got this code:
import std.stdio;
import std.string;
void main()
{
char [] str = "aaa".dup;
char [] *str_ptr;
writeln(str_ptr);
str_ptr = &str;
*(str_ptr[0].ptr) = 'f';
writeln(*str_ptr);
writeln(str_ptr[0][1]);
}
I thought that I am creating an array of pointers char [] *str_ptr so every single pointer will point to a single char. But it looks like str_ptr points to the start of the string str. I have to make a decision because if I am trying to give access to (for example) writeln(str_ptr[1]); I am getting a lot of information on console output. That means that I am linking to an element outside the boundary.
Could anybody explain if it's an array of pointers and if yes, how an array of pointers works in this case?
What you're trying to achieve is far more easily done: just index the char array itself. No need to go through explicit pointers.
import std.stdio;
import std.string;
void main()
{
char [] str = "aaa".dup;
str[0] = 'f';
writeln(str[0]); // str[x] points to individual char
writeln(str); // faa
}
An array in D already is a pointer on the inside - it consists of a pointer to its elements, and indexing it gets you to those individual elements. str[1] leads to the second char (remember, it starts at zero), exactly the same as *(str.ptr + 1). Indeed, the compiler generates that very code (though plus range bounds checking in D by default, so it aborts instead of giving you gibberish). The only note is that the array must access sequential elements in memory. This is T[] in D.
An array of pointers might be used if they all the pointers go to various places, that are not necessarily in sequence. Maybe you want the first pointer to go to the last element, and the second pointer to to the first element. Or perhaps they are all allocated elements, like pointers to objects. The correct syntax for this in D is T*[] - read from right to left, "an array of pointers to T".
A pointer to an array is pretty rare in D, it is T[]*, but you might use it when you need to update the length of some other array held by another function. For example
int[] arr;
int[]* ptr = &arr;
(*ptr) ~= 1;
assert(arr.length == 1);
If ptr wasn't a pointer, the arr length would not be updated:
int[] arr;
int[] ptr = arr;
ptr ~= 1;
assert(arr.length == 1); // NOPE! fails, arr is still empty
But pointers to arrays are about modifying the length of the array, or maybe pointing it to something entirely new and updating the original. It isn't necessary to share individual elements inside it.

Safety of set_len operation on Vec, with predefined capacity

Is it safe to call set_len on Vec that has declared capacity? Like this:
let vec = unsafe {
let temp = Vec::with_capacity(N);
temp.set_len(N);
temp
}
I need my Vector to be of size N before any elements are to be added.
Looking at docs:
https://doc.rust-lang.org/collections/vec/struct.Vec.html#capacity-and-reallocation
https://doc.rust-lang.org/collections/vec/struct.Vec.html#method.with_capacity
https://doc.rust-lang.org/collections/vec/struct.Vec.html#method.set_len
I'm a bit confused. Docs say that with_capacity doesn't change length and set_len says that caller must insure vector has proper length. So is this safe?
The reason I need this is because I was looking for a way to declare a mutable buffer (&mut [T]) of size N and Vec seems to fit the bill the best. I just wanted to avoid having my types implement Clone that vec![0;n] would bring.
The docs are just a little ambiguously stated. The wording could be better. Your code example is as "safe" as the following stack-equivalent:
let mut arr: [T; N] = mem::uninitialized();
Which means that as long as you write to an element of the array before reading it you are fine. If you read before writing, you open the door to nasal demons and memory unsafety.
I just wanted to avoid clone that vec![0;n] would bring.
llvm will optimize this to a single memset.
If by "I need my Vector to be of size N" you mean you need memory to be allocated for 10 elements, with_capacity is already doing that.
If you mean you want to have a vector with length 10 (not sure why you would, though...) you need to initialize it with an initial value.
i.e.:
let mut temp: Vec<i32> = Vec::with_capacity(10); // allocate room in memory for
// 10 elements. The vector has
// initial capacity 10, length will be the
// number of elements you push into it
// (initially 0)
v.push(1); // now length is 1, capacity still 10
vs
let mut v: Vec<i32> = vec![0; 10]; // create a vector with 10 elements
// initialized to 0. You can mutate
// those in place later.
// At this point, length = capacity = 10
v[0] = 1; // mutating first element to 1.
// length and capacity are both still 10

When should we use reserve() of vector?

I always use resize() because I cannot use reserve as it gives error: vector subscript out of range. As I've read info about the differences of resize() and reserve(), I saw things like reserve() sets max. number of elements could be allocated but resize() is currently what we have. In my code I know max. number of elements but reserve() doesn't give me anything useful. So, how can I make use of reserve()?
A vector has a capacity (as returned by capacity() and a size (as returned by size(). The first states how many elements a vector can hold, the second how many he does currently hold.
resize changes the size, reserve only changes the capacity.
See also the resize and reserve documentation.
As for the use cases:
Let's say you know beforehand how many elements you want to put into your vector, but you don't want to initialize them - that's the use case for reserve. Let's say your vector was empty before; then, directly after reserve(), before doing any insert or push_back, you can, of course, not directly access as many elements as you reserved space for - that would trigger the mentioned error (subscript out of range) - since the elements you are trying to access are not yet initialized; the size is still 0. So the vector is still empty; but if you choose the reserved capacity in such a way that it's higher or equal to the maximum size your vector will get, you are avoiding expensive reallocations; and at the same time you will also avoid the (in some cases expensive) initialization of each vector element that resize would do.
With resize, on the other hand, you say: Make the vector hold as many elements as I gave as an argument; initialize those whose indices are exceeding the old size, or remove the ones exceeding the given new size.
Note that reserve will never affect the elements currently in the vector (except their storage location if reallocation is needed - but not their values or their number)! Meaning that if the size of a vector is currently greater than what you pass to a call to the reserve function on that same vector, reserve will just do nothing.
See also the answer to this question: Choice between vector::resize() and vector::reserve()
reserve() is a performance optimization for using std::vector.
A typical std::vector implementation would reserve some memory on the first push_back(), for example 4 elements. When the 5th element gets pushed, the vector has to be resized: new memory has to be allocated (usually the size is doubled), the contents of the vector have to be copied to the new location, and the old memory has to be deleted.
This becomes an expensive operation when the vector holds a lot of elements. For example when you push_back() the 2^24+1th element, 16Million elements get copied just to add one element.
If you know the number of elements in advance you can reserve() the number of elements you are planning to push_back(). In this case expensive copy operations are not necessary because the memory is already reserved for the amount needed.
resize() in contrast changes the number of elements in the vector.
If no elements are added and you use resize(20), 20 elements will now be accessable. Also the amount of memory allocated will increase to an implementation-dependent value.
If 50 elements are added and you use resize(20), the last 30 elements will be removed from the vector and not be accessable any more. This doesn't necessarily change the memory allocated but this may also be implementation-dependent.
resize(n) allocates the memory for n objects and default-initializes them.
reserve() allocates the memory but does not initialize. Hence, reserve won't change the value returned by size(), but it will change the result of capacity().
Edited after underscore_d's comment.
Description how functions implemented in VS2015
VS2015 CTP6
This error dialog exist only in the DEBUG mode, when #if _ITERATOR_DEBUG_LEVEL == 2 is defined. In the RELEASE mode we don't have any problems. We get a current value by return (*(this->_Myfirst() + _Pos), so size value isn't needed:
reference operator[](size_type _Pos)
{ // subscript mutable sequence
#if _ITERATOR_DEBUG_LEVEL == 2
if (size() <= _Pos)
{ // report error
_DEBUG_ERROR("vector subscript out of range");
_SCL_SECURE_OUT_OF_RANGE;
}
#elif _ITERATOR_DEBUG_LEVEL == 1
_SCL_SECURE_VALIDATE_RANGE(_Pos < size());
#endif /* _ITERATOR_DEBUG_LEVEL */
return (*(this->_Myfirst() + _Pos));
}
If we see in the vector's source code, we can find, that a difference between resize and reserve is only in the changing of the value of this->_Mylast() in the func resize().
reserve() calls _Reallocate.
resize() calls _Reserve, that calls _Reallocate and then resize() also changes the value of this->_Mylast(): this->_Mylast() += _Newsize - size(); that is used in the size calculation(see last func)
void resize(size_type _Newsize)
{ // determine new length, padding as needed
if (_Newsize < size())
_Pop_back_n(size() - _Newsize);
else if (size() < _Newsize)
{ // pad as needed
_Reserve(_Newsize - size());
_TRY_BEGIN
_Uninitialized_default_fill_n(this->_Mylast(), _Newsize - size(),
this->_Getal());
_CATCH_ALL
_Tidy();
_RERAISE;
_CATCH_END
this->_Mylast() += _Newsize - size();
}
}
void reserve(size_type _Count)
{ // determine new minimum length of allocated storage
if (capacity() < _Count)
{ // something to do, check and reallocate
if (max_size() < _Count)
_Xlen();
_Reallocate(_Count);
}
}
void _Reallocate(size_type _Count)
{ // move to array of exactly _Count elements
pointer _Ptr = this->_Getal().allocate(_Count);
_TRY_BEGIN
_Umove(this->_Myfirst(), this->_Mylast(), _Ptr);
_CATCH_ALL
this->_Getal().deallocate(_Ptr, _Count);
_RERAISE;
_CATCH_END
size_type _Size = size();
if (this->_Myfirst() != pointer())
{ // destroy and deallocate old array
_Destroy(this->_Myfirst(), this->_Mylast());
this->_Getal().deallocate(this->_Myfirst(),
this->_Myend() - this->_Myfirst());
}
this->_Orphan_all();
this->_Myend() = _Ptr + _Count;
this->_Mylast() = _Ptr + _Size;
this->_Myfirst() = _Ptr;
}
void _Reserve(size_type _Count)
{ // ensure room for _Count new elements, grow exponentially
if (_Unused_capacity() < _Count)
{ // need more room, try to get it
if (max_size() - size() < _Count)
_Xlen();
_Reallocate(_Grow_to(size() + _Count));
}
}
size_type size() const _NOEXCEPT
{ // return length of sequence
return (this->_Mylast() - this->_Myfirst());
}
Problems
But some problems exist with reserve:
end() will be equal to begin()
23.2.1 General container requirements
5:
end() returns an iterator which is the past-the-end value for the container.
iterator end() _NOEXCEPT
{ // return iterator for end of mutable sequence
return (iterator(this->_Mylast(), &this->_Get_data()));
}
i.e. _Mylast() will be equal _Myfirst()
at() will generate an out_of_range exception.
23.2.3 Sequence containers
17:
The member function at() provides bounds-checked access to container elements. at() throws out_of_range if n >= a.size().
in the VisualStudio debugger we can see vector values, when size isn't 0
with resize:
with reserve and manually setted #define _ITERATOR_DEBUG_LEVEL 0:

Resources