How would you go about creating a stack-allocated vector-like container with some fixed upper limit on the number of elements it can contain? You can see my attempt at this below, but it doesn't compile:
// The following is at crate level
#![feature(unsafe_destructor)]
use std::mem;
use std::ptr;
use std::slice::Iter;
pub struct StackVec<T> {
buf: [T; 10],
len: usize,
}
impl<T> StackVec<T> {
pub fn new() -> StackVec<T> {
StackVec {
buf: unsafe { mem::uninitialized() },
len: 0,
}
}
pub fn iter(&self) -> Iter<T> {
(&self.buf[..self.len]).iter()
}
pub fn push(&mut self, value: T) {
unsafe { ptr::write(self.buf.get_mut(self.len).unwrap(), value); }
self.len += 1;
}
pub fn pop(&mut self) -> Option<T> {
if self.len == 0 {
None
} else {
unsafe {
self.len -= 1;
Some(ptr::read(self.buf.get(self.len).unwrap()))
}
}
}
}
#[unsafe_destructor]
impl<T> Drop for StackVec<T>
where T: Drop
{
fn drop(&mut self) {
for elem in self.iter() {
unsafe { ptr::read(elem); }
}
unsafe { mem::forget(self.buf); } // ERROR: [1]
}
}
This is the compile-time error I get:
[1] error: cannot move out of type stackvec::StackVec<T>, which defines the Drop trait
I've written an implementation, and I'll go over the highlights.
Full code is available at crates.io/arrayvec (API doc)
Use a trait (called Array) to abstract over different array sizes. It needs to provide raw pointers so that we can use the array as backing storage.
/// Trait for fixed size arrays.
pub unsafe trait Array {
/// The array's element type
type Item;
unsafe fn new() -> Self;
fn as_ptr(&self) -> *const Self::Item;
fn as_mut_ptr(&mut self) -> *mut Self::Item;
fn capacity() -> usize;
}
In contemporary rust style, we can only implement this trait for specific array sizes. I cover some small sizes with a macro:
macro_rules! fix_array_impl {
($len:expr ) => (
unsafe impl<T> Array for [T; $len] {
type Item = T;
/// Note: Returning an uninitialized value here only works
/// if we can be sure the data is never used. The nullable pointer
/// inside enum optimization conflicts with this this for example,
/// so we need to be extra careful. See `Flag` enum.
unsafe fn new() -> [T; $len] { mem::uninitialized() }
fn as_ptr(&self) -> *const T { self as *const _ as *const _ }
fn as_mut_ptr(&mut self) -> *mut T { self as *mut _ as *mut _}
fn capacity() -> usize { $len }
}
)
}
macro_rules! fix_array_impl_recursive {
() => ();
($len:expr, $($more:expr,)*) => (
fix_array_impl!($len);
fix_array_impl_recursive!($($more,)*);
);
}
fix_array_impl_recursive!(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 40, 48, 56, 64, 72, 96, 128, 160, 192, 224,);
We need to suppress the default drop of the embedded array. You can do this by in theory using Option<Array> and using ptr::write to overwrite it with None at the last moment in Drop.
We must however use our own enum, similar to Option for one reason: We need to avoid non-nullable pointer optimization that applies to enums that have the same representation as Option. Then in Drop we do the crucial inhibition of the inner array's default destructor: we forcibly overwrite our enum. Only after destructing all the elements, of course.
/// Make sure the non-nullable pointer optimization does not occur!
#[repr(u8)]
enum Flag<T> {
Dropped,
Alive(T),
}
/// A vector with a fixed capacity.
pub struct ArrayVec<A: Array> {
len: u8,
xs: Flag<A>,
}
impl<A: Array> Drop for ArrayVec<A> {
fn drop(&mut self) {
// clear all elements, then inhibit drop of inner array
while let Some(_) = self.pop() { }
unsafe {
ptr::write(&mut self.xs, Flag::Dropped);
}
}
}
We implement Deref<Target=[T]> and DerefMut and get tons of slice methods for free. This is a great feature of Rust!
impl<A: Array> Deref for ArrayVec<A> {
type Target = [A::Item];
fn deref(&self) -> &[A::Item] {
unsafe {
slice::from_raw_parts(self.inner_ref().as_ptr(), self.len())
}
}
}
The ArrayVec type has an invariant, that the Flag<A> is always Flag::Alive(A) when the value is alive. We should be able to optimize with this in mind. (A FIXME is marked there.)
fn inner_mut(&mut self) -> &mut A {
// FIXME: Optimize this, we know it's always present.
match self.xs {
Flag::Alive(ref mut xs) => xs,
_ => unreachable!(),
}
}
Thank you kmky for asking question! Exploring this answer led to the creation of arrayvec linked above, and uncovered some of the points that were very important to have it be a safe rust data structure.
My guess is that the compiler doesn't know which elements of the array are "free" and which need a destructor to run when the array is dropped.
Try storing Option<T>, which has a .take() method that will allow you to move an element out of the array.
Related
#![feature(ptr_internals)]
use core::ptr::Unique;
struct PtrWrapper {
id: usize,
self_reference: Unique<Self>
}
impl PtrWrapper {
fn new() -> Self {
let dummy = unsafe {Unique::new_unchecked(std::ptr::null_mut::<PtrWrapper>())};
let mut ret = Self {id:0, self_reference: dummy };
let new_ptr = &mut ret as *mut Self;
debug_print(new_ptr);
ret.self_reference = Unique::new(new_ptr).unwrap();
debug_print(ret.self_reference.as_ptr());
ret
}
fn get_id(&self) -> usize {
self.id.clone()
}
}
fn main() {
println!("START");
let mut wrapper = PtrWrapper::new();
wrapper.id = 10;
let ptr = wrapper.self_reference.as_ptr();
unsafe {
(*ptr).id += 30;
println!("The next print isn't 40? Garbage bytes");
debug_print(ptr);
let tmp = &mut wrapper as *mut PtrWrapper;
(*tmp).id += 500;
println!("The next print isn't 540?");
debug_print(tmp);
}
println!("Below debug_print is proof of undefined behavior! Garbage bytes\n");
debug_print(wrapper.self_reference.as_ptr());
debug_print(&mut wrapper as *mut PtrWrapper);
debug_print_move(wrapper);
println!("Why is the assertion below false?");
assert_eq!(unsafe{(*ptr).id}, 540);
}
fn debug_print_move(mut wrapper: PtrWrapper) {
debug_print(&mut wrapper as *mut PtrWrapper);
}
fn debug_print(ptr: *mut PtrWrapper) {
println!("Address: {:p}", ptr);
println!("ID: {}\n", unsafe {(*ptr).get_id()});
}
The above code should compile fine in rust playground with a nightly selected version. Pay attention to the console outputs.
My question is: Why are the intermittent results not equal to the value I expect them to equal? In the case below, there is no multiple access simultaneously (single threaded), so there aren't any data races. There are, however, implicitly multiple mutable version of the object existing on the stack.
As expected, the memory location of the pointer changes with the tmp variable as well as when the entire object is moved into debug_print_move. It appears that using the tmp pointer works as expected (i.e., adds 500), however, the pointers which are obtained from the Unique<PtrWrapper> object seems to point to irrelevant locations in memory.
As Stargateur recommended, in order to solve this problem we need to Pin the object which needs to be self-referential. I ended up using:
pin-api = "0.2.1"
In cargo.toml instead of std::pin::pin. Next, I set this up the struct and its implementation:
#![feature(ptr_internals, pin_into_inner, optin_builtin_traits)]
// not available on rust-playground
extern crate pin_api;
use pin_api::{boxed::PinBox, marker::Unpin, mem::Pin};
///test
pub struct PtrWrapper<T>
where
T: std::fmt::Debug,
{
///tmp
pub obj: T,
/// pinned object
pub self_reference: *mut Self,
}
impl<T> !Unpin for PtrWrapper<T> where T: std::fmt::Debug {}
impl<T> PtrWrapper<T>
where
T: std::fmt::Debug,
{
///test
pub fn new(obj: T) -> Self {
Self {
obj,
self_reference: std::ptr::null_mut(),
}
}
///test
pub fn init(mut self: Pin<PtrWrapper<T>>) {
let mut this: &mut PtrWrapper<T> = unsafe { Pin::get_mut(&mut self) };
this.self_reference = this as *mut Self;
}
/// Debug print
pub fn print_obj(&self) {
println!("Obj value: {:#?}", self.obj);
}
}
Finally, the test function:
fn main2() {
unsafe {
println!("START");
let mut wrapper = PinBox::new(PtrWrapper::new(10));
wrapper.as_pin().init();
let m = wrapper.as_pin().self_reference;
(*m).obj += 30;
println!("The next print is 40");
debug_print(m);
let tmp = wrapper.as_pin().self_reference;
(*tmp).obj += 500;
println!("The next print is 540?");
debug_print(tmp);
debug_print(wrapper.self_reference);
let cpy = PinBox::get_mut(&mut wrapper);
debug_print_move(cpy);
std::mem::drop(wrapper);
println!("Works!");
assert_eq!(unsafe { (*m).obj }, 540);
}
}
fn debug_print_move<T>(mut wrapper: &mut PtrWrapper<T>)
where
T: std::fmt::Debug,
{
debug_print(&mut *wrapper as *mut PtrWrapper<T>);
}
fn debug_print<T>(ptr: *mut PtrWrapper<T>)
where
T: std::fmt::Debug,
{
println!("Address: {:p}", ptr);
unsafe { (*ptr).print_obj() };
}
On a side note, pin-api does not exist on rust playground. You could still use std::pin::Pin, however it would require further customization.
I can use resize, but it seems like overkill because I do not need to resize the vector, just modify its values. Using a new variable is not an option, since this vector is actually a field in a struct.
I guess that resize is efficient, and probably the answer to my question, but its name does not carry the meaning of resetting the values without modifying the size.
In C, I would use memset (in opposition to realloc).
Illustration of my question:
let my_vec_size = 42;
let mut my_vec = Vec::new(); // 'my_vec' will always have a size of 42
my_vec.resize(my_vec_size, false); // Set the size to 42, and all values to false
// [ ... ] piece of code where the values in 'my_vec' will be modified, checked, etc ...
// now I need to reuse my_vec.
// Possibility A -> use resize again
my_vec.resize(my_vec_size, false);
// Possibility B -> iterate on the vector to modify its values (long and laborious)
for item in my_vec.iter_mut() {
*item = false;
}
// Possibility C ?
The most efficient way in general is to reset the values themselves (aka B):
for item in &mut my_vec { *item = false; }
For booleans it is not immediately obvious, however for a String it is important to preserve the allocated buffer of each element:
for item in &mut my_vec { item.clear(); }
If discarding and recreating the elements of the Vec is cheap, such as the case of the boolean or if the elements will be overwritten anyway, then a combination of clear and resize is easier:
my_vec.clear();
my_vec.resize(my_vec_size, false);
resize by itself will not work to "reset" values:
const LEN: usize = 3;
fn main() {
let mut values = vec![false; LEN];
values[0] = true;
values.resize(LEN, false);
println!("{:?}", values); // [true, false, false]
}
Just use a for loop:
for v in &mut values {
*v = false;
}
println!("{:?}", values); // [false, false, false]
If that sight offends you, write an extension trait:
trait ResetExt<T: Copy> {
fn reset(&mut self, val: T);
}
impl<T: Copy> ResetExt<T> for [T] {
fn reset(&mut self, value: T) {
for v in self {
*v = value;
}
}
}
values.reset(false);
println!("{:?}", values); // [false, false, false]
The trait idea can be extended so that each value knows how to reset itself, if that makes sense for your situation:
trait ResetExt {
fn reset(&mut self);
}
impl<T: ResetExt> ResetExt for [T] {
fn reset(&mut self) {
for v in self {
v.reset();
}
}
}
impl ResetExt for bool {
fn reset(&mut self) {
*self = false;
}
}
impl ResetExt for String {
fn reset(&mut self) {
self.clear();
}
}
values.reset();
println!("{:?}", values); // [false, false, false]
In C, I would use memset
std::ptr::write_bytes uses memset internally, so you can (almost) precisely translate this code. An example from the Rust documentation:
let mut vec = vec![0u32; 4];
unsafe {
let vec_ptr = vec.as_mut_ptr();
ptr::write_bytes(vec_ptr, 0xfe, 2);
}
assert_eq!(vec, [0xfefefefe, 0xfefefefe, 0, 0]);
I want to use trait objects in a Vec. In C++ I could make a base class Thing from which is derived Monster1 and Monster2. I could then create a std::vector<Thing*>. Thing objects must store some data e.g. x : int, y : int, but derived classes need to add more data.
Currently I have something like
struct Level {
// some stuff here
pub things: Vec<Box<ThingTrait + 'static>>,
}
struct ThingRecord {
x: i32,
y: i32,
}
struct Monster1 {
thing_record: ThingRecord,
num_arrows: i32,
}
struct Monster2 {
thing_record: ThingRecord,
num_fireballs: i32,
}
I define a ThingTrait with methods for get_thing_record(), attack(), make_noise() etc. and implement them for Monster1 and Monster2.
Trait objects
The most extensible way to implement a heterogeneous collection (in this case a vector) of objects is exactly what you have:
Vec<Box<dyn ThingTrait + 'static>>
Although there are times where you might want a lifetime that's not 'static, so you'd need something like:
Vec<Box<dyn ThingTrait + 'a>>
You could also have a collection of references to traits, instead of boxed traits:
Vec<&dyn ThingTrait>
An example:
trait ThingTrait {
fn attack(&self);
}
impl ThingTrait for Monster1 {
fn attack(&self) {
println!("monster 1 attacks")
}
}
impl ThingTrait for Monster2 {
fn attack(&self) {
println!("monster 2 attacks")
}
}
fn main() {
let m1 = Monster1 {
thing_record: ThingRecord { x: 42, y: 32 },
num_arrows: 2,
};
let m2 = Monster2 {
thing_record: ThingRecord { x: 42, y: 32 },
num_fireballs: 65,
};
let things: Vec<Box<dyn ThingTrait>> = vec![Box::new(m1), Box::new(m2)];
}
Box<dyn SomeTrait>, Rc<dyn SomeTrait>, &dyn SomeTrait, etc. are all trait objects. These allow implementation of the trait on an infinite number of types, but the tradeoff is that it requires some amount of indirection and dynamic dispatch.
See also:
What makes something a "trait object"?
What does "dyn" mean in a type?
Enums
As mentioned in the comments, if you have a fixed number of known alternatives, a less open-ended solution is to use an enum. This doesn't require that the values be Boxed, but it will still have a small amount of dynamic dispatch to decide which concrete enum variant is present at runtime:
enum Monster {
One(Monster1),
Two(Monster2),
}
impl Monster {
fn attack(&self) {
match *self {
Monster::One(_) => println!("monster 1 attacks"),
Monster::Two(_) => println!("monster 2 attacks"),
}
}
}
fn main() {
let m1 = Monster1 {
thing_record: ThingRecord { x: 42, y: 32 },
num_arrows: 2,
};
let m2 = Monster2 {
thing_record: ThingRecord { x: 42, y: 32 },
num_fireballs: 65,
};
let things = vec![Monster::One(m1), Monster::Two(m2)];
}
See also:
Why does an enum require extra memory size?
I understand that the preferred way to iterate in Rust is through the for var in (range) syntax, but sometimes I'd like to work on more than one of the elements in that range at a time.
From a Ruby perspective, I'm trying to find a way of doing (1..100).each_slice(5) do |this_slice| in Rust.
I'm trying things like
for mut segment_start in (segment_size..max_val).step_by(segment_size) {
let this_segment = segment_start..(segment_start + segment_size).iter().take(segment_size);
}
but I keep getting errors that suggest I'm barking up the wrong type tree. The docs aren't helpful either--they just don't contain this use case.
What's the Rust way to do this?
Use chunks (or chunks_mut if you need mutability):
fn main() {
let things = [5, 4, 3, 2, 1];
for slice in things.chunks(2) {
println!("{:?}", slice);
}
}
Outputs:
[5, 4]
[3, 2]
[1]
The easiest way to combine this with a Range would be to collect the range to a Vec first (which dereferences to a slice):
fn main() {
let things: Vec<_> = (1..100).collect();
for slice in things.chunks(5) {
println!("{:?}", slice);
}
}
Another solution that is pure-iterator would be to use Itertools::chunks_lazy:
extern crate itertools;
use itertools::Itertools;
fn main() {
for chunk in &(1..100).chunks_lazy(5) {
for val in chunk {
print!("{}, ", val);
}
println!("");
}
}
Which suggests a similar solution that only requires the standard library:
fn main() {
let mut range = (1..100).peekable();
while range.peek().is_some() {
for value in range.by_ref().take(5) {
print!("{}, ", value);
}
println!("");
}
}
One trick is that Ruby and Rust have different handling here, mostly centered around efficiency.
In Ruby Enumerable can create new arrays to stuff values in without worrying about ownership and return a new array each time (check with this_slice.object_id).
In Rust, allocating a new vector each time would be pretty unusual. Additionally, you can't easily return a reference to a vector that the iterator holds due to complicated lifetime concerns.
A solution that's very similar to Ruby's is:
fn main() {
let mut range = (1..100).peekable();
while range.peek().is_some() {
let chunk: Vec<_> = range.by_ref().take(5).collect();
println!("{:?}", chunk);
}
}
Which could be wrapped up in a new iterator that hides the details:
use std::iter::Peekable;
struct InefficientChunks<I>
where I: Iterator
{
iter: Peekable<I>,
size: usize,
}
impl<I> Iterator for InefficientChunks<I>
where I: Iterator
{
type Item = Vec<I::Item>;
fn next(&mut self) -> Option<Self::Item> {
if self.iter.peek().is_some() {
Some(self.iter.by_ref().take(self.size).collect())
} else {
None
}
}
}
trait Awesome: Iterator + Sized {
fn inefficient_chunks(self, size: usize) -> InefficientChunks<Self> {
InefficientChunks {
iter: self.peekable(),
size: size,
}
}
}
impl<I> Awesome for I where I: Iterator {}
fn main() {
for chunk in (1..100).inefficient_chunks(5) {
println!("{:?}", chunk);
}
}
Collecting into a vec can easily kill your performance. An approach similar to in the question is perfectly fine.
fn chunk_range(range: Range<usize>, chunk_size: usize) -> impl Iterator<Item=Range<usize>> {
range.clone().step_by(chunk_size).map(move |block_start| {
let block_end = (block_start + chunk_size).min(range.end);
block_start..block_end
})
}
I need to convert &[u8] to a hex representation. For example [ A9, 45, FF, 00 ... ].
The trait std::fmt::UpperHex is not implemented for slices (so I can't use std::fmt::format). Rust has the serialize::hex::ToHex trait, which converts &[u8] to a hex String, but I need a representation with separate bytes.
I can implement trait UpperHex for &[u8] myself, but I'm not sure how canonical this would be. What is the most canonical way to do this?
Rust 1.26.0 and up
The :x? "debug with hexadecimal integers" formatter can be used:
let data = b"hello";
// lower case
println!("{:x?}", data);
// upper case
println!("{:X?}", data);
let data = [0x0, 0x1, 0xe, 0xf, 0xff];
// print the leading zero
println!("{:02X?}", data);
// It can be combined with the pretty modifier as well
println!("{:#04X?}", data);
Output:
[68, 65, 6c, 6c, 6f]
[68, 65, 6C, 6C, 6F]
[00, 01, 0E, 0F, FF]
[
0x00,
0x01,
0x0E,
0x0F,
0xFF,
]
If you need more control or need to support older versions of Rust, keep reading.
Rust 1.0 and up
use std::fmt::Write;
fn main() {
let mut s = String::new();
for &byte in "Hello".as_bytes() {
write!(&mut s, "{:X} ", byte).expect("Unable to write");
}
println!("{}", s);
}
This can be fancied up by implementing one of the formatting traits (fmt::Debug, fmt::Display, fmt::LowerHex, fmt::UpperHex, etc.) on a wrapper struct and having a little constructor:
use std::fmt;
struct HexSlice<'a>(&'a [u8]);
impl<'a> HexSlice<'a> {
fn new<T>(data: &'a T) -> HexSlice<'a>
where
T: ?Sized + AsRef<[u8]> + 'a,
{
HexSlice(data.as_ref())
}
}
// You can choose to implement multiple traits, like Lower and UpperHex
impl fmt::Display for HexSlice<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for byte in self.0 {
// Decide if you want to pad the value or have spaces inbetween, etc.
write!(f, "{:X} ", byte)?;
}
Ok(())
}
}
fn main() {
// To get a `String`
let s = format!("{}", HexSlice::new("Hello"));
// Or print it directly
println!("{}", HexSlice::new("world"));
// Works with
HexSlice::new("Hello"); // string slices (&str)
HexSlice::new(b"Hello"); // byte slices (&[u8])
HexSlice::new(&"World".to_string()); // References to String
HexSlice::new(&vec![0x00, 0x01]); // References to Vec<u8>
}
You can be even fancier and create an extension trait:
trait HexDisplayExt {
fn hex_display(&self) -> HexSlice<'_>;
}
impl<T> HexDisplayExt for T
where
T: ?Sized + AsRef<[u8]>,
{
fn hex_display(&self) -> HexSlice<'_> {
HexSlice::new(self)
}
}
fn main() {
println!("{}", "world".hex_display());
}
use hex::encode:
let a: [u8;4] = [1, 3, 3, 7];
assert_eq!(hex::encode(&a), "01030307");
[dependencies]
hex = "0.4"
Since the accepted answer doesn't work on Rust 1.0 stable, here's my attempt. Should be allocationless and thus reasonably fast. This is basically a formatter for [u8], but because of the coherence rules, we must wrap [u8] to a self-defined type ByteBuf(&[u8]) to use it:
struct ByteBuf<'a>(&'a [u8]);
impl<'a> std::fmt::LowerHex for ByteBuf<'a> {
fn fmt(&self, fmtr: &mut std::fmt::Formatter) -> Result<(), std::fmt::Error> {
for byte in self.0 {
try!( fmtr.write_fmt(format_args!("{:02x}", byte)));
}
Ok(())
}
}
Usage:
let buff = [0_u8; 24];
println!("{:x}", ByteBuf(&buff));
There's a crate for this: hex-slice.
For example:
extern crate hex_slice;
use hex_slice::AsHex;
fn main() {
let foo = vec![0u32, 1, 2 ,3];
println!("{:02x}", foo.as_hex());
}
I'm doing it this way:
let bytes : Vec<u8> = "привет".to_string().as_bytes().to_vec();
let hex : String = bytes.iter()
.map(|b| format!("{:02x}", b).to_string())
.collect::<Vec<String>>()
.join(" ");