Write a struct with multiple variables into a single byte array - pointers

I want to create a byte storage for the following structure and put both variables (tag and i/d) into the storage. Then I want to write the storage into the TcpStream in a single write; due to a bug on the server side, I can't do it with multiple calls.
I need to use a C-style API that involves unions. The total structure size is 12 bytes:
struct element {
int tag;
union {
int i; double d;
} data;
}
How do I do it? Maybe there's a better way of doing it?

Something like this should be a good start.
You need to use the byteorder crate to handle endianness and write bytes to a stream.
The element type can be conveniently represented as enum on the Rust side.
extern crate byteorder;
use std::net::TcpStream;
use byteorder::{ByteOrder, BigEndian, WriteBytesExt};
use std::io::Write;
enum Element {
A(i32),
B(f64)
}
impl Element {
fn write_to_buffer<T: ByteOrder>(&self, buffer: &mut [u8; 12]) {
let mut buffer = &mut buffer[..];
match *self {
Element::A(n) => {
buffer.write_i32::<T>(0).unwrap();
buffer.write_i32::<T>(n).unwrap();
},
Element::B(n) => {
buffer.write_i32::<T>(1).unwrap();
buffer.write_f64::<T>(n).unwrap();
},
}
}
}
fn main() {
let mut stream = TcpStream::connect("127.0.0.1:1234").unwrap();
let mut buffer = [0u8; 12];
let b = Element::B(317.98);
b.write_to_buffer::<BigEndian>(&mut buffer);
stream.write(&buffer).unwrap();
}

Related

How does Rust model iterators? Stack or Heap?

I know that vectors in Rust are allocated on the heap where the pointer, capacity, and length of the vector are stored on the stack.
Let's say I have the following vector:
let vec = vec![1, 2, 3];
If I make an iterator from this vector:
let vec_iter = vec.iter();
How does Rust model this iterator in terms of allocation on the heap vs. stack? Is it the same as the vector?
Most iterators are stack allocated.
In cases like Vec::iter(), they create iterators that have two pointers, one to the end, one to the first element, like so
use std::marker::PhantomData;
pub struct Iter<'a, T: 'a> {
ptr: *const T,
end: *const T,
_marker: PhantomData<&'a T>,
}
Since pointer doesn't convey ownership or lifetime, PhantomData<&'a T> tells the compiler this struct holds reference of lifetime 'a to type T
Iter::next looks somewhat like this
impl<'a, T> Iterator for Iter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item> {
unsafe {// pointer dereferencing is only allowed in unsafe
if self.ptr == self.end {
None
} else {
let old = self.ptr;
self.ptr = self.ptr.offset(1);
Some(&*old)
}
}
}
}
And a new Iter is created like so
impl<'a, T: 'a> Iter<'a, T> {
pub fn new(slice: &'a [T]) -> Self {
assert_ne!(std::mem::size_of::<T>(), 0); // doesn't handle zero size type
let start = slice.as_ptr();
Iter {
ptr: start,
end: unsafe { start.add(slice.len()) },
_marker: PhantomData,
}
}
}
Now we can use it like any other iterators
let v = vec!['a', 'b', 'c', 'd', 'e'];
for c in Iter::new(&v) {
println!("{c}");
}
And thanks to PhantomData, the compiler can guard us against use after free and other memory issues.
let iter = {
let v = vec!['a', 'b', 'c', 'd', 'e'];
Iter::new(&v) // error! borrowed value doesn't live long enough
};
for c in iter {
println!("{c}");
}
the pointer, capacity, and length of the vector are stored on the stack
→ not really. They are stored wherever the user wishes, which may be on the stack, in the global data segment, or on the heap:
// In the global data segment
static VEC: Vec<()> = Vec::new();
struct Foo {
v: Vec<()>,
}
fn main() {
// On the stack
let v: Vec<()> = Vec::new();
// On the heap
let f = Box::new (Foo { v: Vec::new(), });
}
And the same goes for iterators. Most of the time they simply hold references to the original data, wherever that may be, and the references themselves are stored inside the iterator struct wherever the user puts it.

Dropping buffer when writing custom Vec

Reading the rustonomicon, I found this implementation of a custom Vec<T>.
pub struct IntoIter<T> {
buf: NonNull<T>,
cap: usize,
start: *const T,
end: *const T,
_marker: PhantomData<T>,
}
and it's impl of IntoIterator
impl<T> IntoIterator for Vec<T> {
type Item = T;
type IntoIter = IntoIter<T>;
fn into_iter(self) -> IntoIter<T> {
// Can't destructure Vec since it's Drop
let ptr = self.ptr;
let cap = self.cap;
let len = self.len;
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
unsafe {
IntoIter {
buf: ptr,
cap: cap,
start: ptr.as_ptr(),
end: if cap == 0 {
// can't offset off this pointer, it's not allocated!
ptr.as_ptr()
} else {
ptr.as_ptr().add(len)
},
_marker: PhantomData,
}
}
}
}
I want to ask about this specific line:
// Make sure not to drop Vec since that would free the buffer
mem::forget(self);
In which case the buffer could be freed? More specificaly, in that piece of code.
Or could be released in another implementation and, when called first, clean the buff and then, with the buff freed, generate a use-after-free error?
Would be enought to just mem::forget(self.buf)?
Without the forget, the self object would be dropped when that function returns, and its drop method will be called. The drop implementation would then free the buffer, which is not what we want, because the IntoIter is using it.
Using forget, we get rid of the self object without running its drop function, letting the IntoIter object semantically take ownership of the buffer, use it, and free it when it's done using it.

Is there a way to convert a ChunkMut<T> from Vec::chunks_mut to a slice &mut [T]?

I'm filling a vector in parallel, but for this generalized question, I've only found hints and no answers.
The code below works, but I want to switch to Rng::fill instead of iterating over each chunk. It might not be possible to have multiple mutable slices inside a single Vec; I'm not sure.
extern crate rayon;
extern crate rand;
extern crate rand_xoshiro;
use rand::{Rng, SeedableRng};
use rand_xoshiro::Xoshiro256StarStar;
use rayon::prelude::*;
use std::{iter, env};
use std::sync::{Arc, Mutex};
// i16 because I was filling up my RAM for large tests...
fn gen_rand_vec(data: &mut [i16]) {
let num_threads = rayon::current_num_threads();
let mut rng = rand::thread_rng();
let mut prng = Xoshiro256StarStar::from_rng(&mut rng).unwrap();
// lazy iterator of fast, unique RNGs
// Arc and Mutex are just so it can be accessed from multiple threads
let rng_it = Arc::new(Mutex::new(iter::repeat(()).map(|()| {
let new_prng = prng.clone();
prng.jump();
new_prng
})));
// generates random numbers for each chunk in parallel
// par_chunks_mut is parallel version of chunks_mut
data.par_chunks_mut(data.len() / num_threads).for_each(|chunk| {
// I used extra braces because it might be required to unlock Mutex.
// Not sure.
let mut prng = { rng_it.lock().unwrap().next().unwrap() };
for i in chunk.iter_mut() {
*i = prng.gen_range(-1024, 1024);
}
});
}
It turns out that a ChunksMut iterator gives slices. I'm not sure how to glean that from the documentation. I figured it out by reading the source:
#[derive(Debug)]
#[stable(feature = "rust1", since = "1.0.0")]
pub struct ChunksMut<'a, T:'a> {
v: &'a mut [T],
chunk_size: usize
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, T> Iterator for ChunksMut<'a, T> {
type Item = &'a mut [T];
#[inline]
fn next(&mut self) -> Option<&'a mut [T]> {
if self.v.is_empty() {
None
} else {
let sz = cmp::min(self.v.len(), self.chunk_size);
let tmp = mem::replace(&mut self.v, &mut []);
let (head, tail) = tmp.split_at_mut(sz);
self.v = tail;
Some(head)
}
}
I guess it's just experience; to others it must be obvious that an iterator of type ChunksMut<T> from Vec<T> returns objects of type [T]. That makes sense now. It just wasn't very clear with the intermediate struct.
pub fn chunks_mut(&mut self, chunk_size: usize) -> ChunksMut<T>
// ...
impl<'a, T> Iterator for ChunksMut<'a, T>
Reading this, it looked like the iterator returned objects of type T, the same as Vec<T>.iter(), which wouldn't make sense.

How to get the index of an element in a vector using pointer arithmetic?

In C you can use the pointer offset to get index of an element within an array, e.g.:
index = element_pointer - &vector[0];
Given a reference to an element in an array, this should be possible in Rust too.
While Rust has the ability to get the memory address from vector elements, convert them to usize, then subtract them - is there a more convenient/idiomatic way to do this in Rust?
There isn't a simpler way. I think the reason is that it would be hard to guarantee that any operation or method that gave you that answer only allowed you to use it with the a Vec (or more likely slice) and something inside that collection; Rust wouldn't want to allow you to call it with a reference into a different vector.
More idiomatic would be to avoid needing to do it in the first place. You can't store references into Vec anywhere very permanent away from the Vec anyway due to lifetimes, so you're likely to have the index handy when you've got the reference anyway.
In particular, when iterating, for example, you'd use the enumerate to iterate over pairs (index, &item).
So, given the issues which people have brought up with the pointers and stuff; the best way, imho, to do this is:
fn index_of_unchecked<T>(slice: &[T], item: &T) -> usize {
if ::std::mem::size_of::<T>() == 0 {
return 0; // do what you will with this case
}
(item as *const _ as usize - slice.as_ptr() as usize)
/ std::mem::size_of::<T>()
}
// note: for zero sized types here, you
// return Some(0) if item as *const T == slice.as_ptr()
// and None otherwise
fn index_of<T>(slice: &[T], item: &T) -> Option<usize> {
let ptr = item as *const T;
if
slice.as_ptr() < ptr &&
slice.as_ptr().offset(slice.len()) > ptr
{
Some(index_of_unchecked(slice, item))
} else {
None
}
}
although, if you want methods:
trait IndexOfExt<T> {
fn index_of_unchecked(&self, item: &T) -> usize;
fn index_of(&self, item: &T) -> Option<usize>;
}
impl<T> IndexOfExt<T> for [T] {
fn index_of_unchecked(&self, item: &T) -> usize {
// ...
}
fn index_of(&self, item: &T) -> Option<usize> {
// ...
}
}
and then you'll be able to use this method for any type that Derefs to [T]

Owned pointer in graph structure

With the generous help of the rust community I managed to get the base of a topological data structure assembled using managed pointers. This came together rather nicely and I was pretty excited about Rust in general. Then I read this post (which seems like a reasonable plan) and it inspired me to back track and try to re-assemble it using only owned pointers if possible.
This is the working version using managed pointers:
struct Dart<T> {
alpha: ~[#mut Dart<T>],
embed: ~[#mut T],
tagged: bool
}
impl<T> Dart<T> {
pub fn new(dim: uint) -> #mut Dart<T> {
let mut dart = #mut Dart{alpha: ~[], embed: ~[], tagged: false};
dart.alpha = vec::from_elem(dim, dart);
return dart;
}
pub fn get_dim(&self) -> uint {
return self.alpha.len();
}
pub fn traverse(#mut self, invs: &[uint], f: &fn(&Dart<T>)) {
let dim = self.get_dim();
for invs.each |i| {if *i >= dim {return}}; //test bounds on invs vec
if invs.len() == 2 {
let spread:int = int::abs(invs[1] as int - invs[0] as int);
if spread == 1 { //simple loop
let mut dart = self;
let mut i = invs[0];
while !dart.tagged {
dart.tagged = true;
f(dart);
dart = dart.alpha[i];
if i == invs[0] {i = invs[1];}
else {i == invs[0];}
} }
// else if spread == 2 { // max 4 cells traversed
// }
}
else {
let mut stack = ~[self];
self.tagged = true;
while !stack.is_empty() {
let mut dart = stack.pop();
f(dart);
for invs.each |i| {
if !dart.alpha[*i].tagged {
dart.alpha[*i].tagged = true;
stack.push(dart);
} } } } } }
After a few hours of chasing lifetime errors I have come to the conclusion that this may not even be possible with owned pointers due to the cyclic nature (without tying the knot as I was warned). My feeble attempt at this is below. My question, is this structure possible to implement without resorting to managed pointers? And if not, is the code above considered reasonably "rusty"? (idiomatic rust). Thanks.
struct GMap<'self,T> {
dim: uint,
darts: ~[~Dart<'self,T>]
}
struct Dart<'self,T> {
alpha: ~[&'self mut Dart<'self, T>],
embed: ~[&'self mut T],
tagged: bool
}
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let mut dart = ~Dart{alpha: ~[], embed: ~[], tagged: false};
let dartRef: &'self mut Dart<'self, T> = dart;
dartRef.alpha = vec::from_elem(self.dim, copy dartRef);
self.darts.push(dart);
}
}
I'm pretty sure that using &mut pointers is impossible, since one can only have one such pointer in existence at a time, e.g.:
fn main() {
let mut i = 0;
let a = &mut i;
let b = &mut i;
}
and-mut.rs:4:12: 4:18 error: cannot borrow `i` as mutable more than once at at a time
and-mut.rs:4 let b = &mut i;
^~~~~~
and-mut.rs:3:12: 3:18 note: second borrow of `i` as mutable occurs here
and-mut.rs:3 let a = &mut i;
^~~~~~
error: aborting due to previous error
One could get around the borrow checker unsafely, by either storing unsafe pointer to the memory (ptr::to_mut_unsafe_ptr), or indices into the darts member of GMap. Essentially, storing a single reference to the memory (in self.darts) and all operations have to go through it.
This might look like:
impl<'self, T> GMap<'self, T> {
pub fn new_dart(&'self mut self) {
let ind = self.darts.len();
self.darts.push(~Dart{alpha: vec::from_elem(self.dim, ind), embed: ~[], tagged: false});
}
}
traverse would need to change to either be a method on GMap (e.g. fn(&mut self, node_ind: uint, invs: &[uint], f: &fn(&Dart<T>))), or at least take a GMap type.
(On an entirely different note, there is library support for external iterators, which are far more composable than the internal iterators (the ones that take a closure). So defining one of these for traverse may (or may not) make using it nicer.)

Resources