I am working through the Rust book, namely the minigrep project. There I came across the following snippet:
fn main() {
let args: Vec<String> = env::args().collect();
let (query, filename) = parse_config(&args);
// --snip--
}
fn parse_config(args: &[String]) -> (&str, &str) {
let query = &args[1];
let filename = &args[2];
(query, filename)
}
The confusing piece for me is args: &[String]. If I replace it with args: &Vec<String>, it also works. My guess is that &[String] is a more general type annotation that matches not only &Vec<String>, but also some other types. Is that correct? If so, what other types are matched by [T]?
Generally speaking, [T] is a contiguous sequence and &[T] is a slice.
The reason why the compiler allows &[String] instead of &Vec<String> is that Vec<T> dereferences to [T]. This is called Deref coercion. It can be said that the former notation (in function parameters) is more general; it is also the preferred one. Further details about automatic dereferencing rules can be found in this question.
Related
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|c| c == 32u8).count()
}
The above function does not compile, and gives the following error on the comparison:
trait `&u8: std::cmp::PartialEq` not satisfied
I read this as: "c is a borrowed byte and cannot be compared to a regular byte", but I must be reading this wrong.
What would be the appropriate way to split a Vec<u8> on specific values?
I do realize that there are options when reading files, like splitting a BufReader or I could convert the vector to a string and use str::split. I might go with such a solution (passing in a BufReader instead of a Vec<u8>), but right now I'm just playing around, testing stuff and want to know what I'm doing wrong.
The code
You are actually reading it right: c is indeed a borrowed byte and cannot be compared to a regular byte. Try using any of the functions below instead:
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|&c| c == 32u8).count()
}
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|c| *c == 32u8).count()
}
The first one uses pattern matching on the parameter (&c) to dereference it, while the second one uses the dereference operator (*).
Why is c a &u8 instead of a u8?
If you take a look at the split method on the docs, you will see that the closure parameter is a borrow of the data in Vec. In this case, it means that the parameter will be &u8 instead of u8 (so in your code you are actually comparing &u8 to u8, which Rust doesn't like).
In order to understand why the closure takes the parameter by borrow and not by value, consider what would happen if the parameter was taken by value. In the case of Vec<u8>, there would be no problem since u8 implements Copy. However, in the case of a a Vec<String>, each String would be moved into the closure and destroyed!
I want to build a string s by iterating over a vector of simple structs, appending different strings to acc depending on the struct.
#[derive(Clone, Debug)]
struct Point(Option<i32>, Option<i32>);
impl Point {
fn get_first(&self) -> Option<i32> {
self.0
}
}
fn main() {
let mut vec = vec![Point(None, None); 10];
vec[5] = Point(Some(1), Some(1));
let s: String = vec.iter().fold(
String::new(),
|acc, &ref e| acc + match e.get_first() {
None => "",
Some(ref content) => &content.to_string()
}
);
println!("{}", s);
}
Running this code results in the following error:
error: borrowed value does not live long enough
Some(ref content) => &content.to_string()
^~~~~~~~~~~~~~~~~~~
note: reference must be valid for the expression at 21:22...
|acc, &ref e| acc + match e.get_first() {
^
note: ...but borrowed value is only valid for the expression at 23:33
Some(ref content) => &content.to_string()
^~~~~~~~~~~~~~~~~~~~
The problem is that the lifetime of the &str I create seems to end immediately. However, if to_string() would have returned a &str in the first place, the compiler would not have complained. Then, what is the difference?
How can I make the compiler understand that I want the string references to live as long as I am constructing s?
There is a difference between the result of your branches:
"" is of type &'static str
content is of type i32, so you are converting it to a String and then from that to a &str... but this &str has the same lifetime as the String returned by to_string, which dies too early
A quick work-around, as mentioned by #Dogbert, is to move acc + inside the branches:
let s: String = vec.iter().fold(
String::new(),
|acc, &ref e| match e.get_first() {
None => acc,
Some(ref content) => acc + &content.to_string(),
}
);
However, it's a bit wasteful, because every time we have an integer, we are allocating a String (via to_string) just to immediately discard it.
A better solution is to use the write! macro instead, which just appends to the original string buffer. This means there are no wasted allocations.
use std::fmt::Write;
let s = vec.iter().fold(
String::new(),
|mut acc, &ref e| {
if let Some(ref content) = e.get_first() {
write!(&mut acc, "{}", content).expect("Should have been able to format!");
}
acc
}
);
It's maybe a bit more complicated, notably because formatting adds in error handling, but is more efficient as it only uses a single buffer.
There are multiple solutions to your problem. But first some explanations:
If to_string() would have returned a &str in the first place, the compiler would not have complained. Then, what is the difference?
Suppose there is a method to_str() that returns a &str. What would the signature look like?
fn to_str(&self) -> &str {}
To better understand the issue, lets add explicit lifetimes (that are not necessary thanks to lifetime elision):
fn to_str<'a>(&'a self) -> &'a str {}
It becomes clear that the returned &str lives as long as the receiver of the method (self). This would be OK since the receiver lives long enough for your acc + ... operation. In your case however, the .to_string() call creates a new object the only lives in the second match arm. After the arm's body is left, it will be destroyed. Therefore you can't pass a reference to it to the outer scope (in which acc + ... takes place).
So one possible solution looks like this:
let s = vec.iter().fold(
String::new(),
|acc, e| {
acc + &e.get_first()
.map(|f| f.to_string())
.unwrap_or(String::new())
}
);
It's not optimal, but luckily your default value is an empty string and the owned version of an empty string (String::new()) does not require any heap allocations, so there is no performance penalty.
However, we are still allocating once per integer. For a more efficient solution, see Matthieu M.'s answer.
In general, I prefer to write initializer functions with descriptive names. However, for some structs, there is an obvious default initializer function. The standard Rust name for such a function is new, placed in the impl block for the struct. However, today I realized that I can give a function the same name as a struct, and thought this would be a good way to implement the obvious initializer function. For example:
#[derive(Debug, Clone, Copy)]
struct Pair<T, U> {
first: T,
second: U,
}
#[allow(non_snake_case)]
fn Pair<T, U>(first: T, second: U) -> Pair<T, U> {
Pair::<T, U> {
first: first,
second: second,
}
}
fn main(){
let x = Pair(1, 2);
println!("{:?}", x);
}
This is, in my opinion, much more appealing than this:
let x = Pair::new(1, 2);
However, I've never seen anyone else do this, and my question is simply if there are any problems with this approach. Are there, for example, ambiguities which it can cause which will not be there with the new implementation?
If you want to use Pair(T, U) then you should consider using a tuple struct instead:
#[derive(Debug, Clone, Copy)]
struct Pair<T, U>(T, U);
fn main(){
let x = Pair(1, 2);
println!("{:?}", x);
println!("{:?}, {:?}", (x.0, x.1));
}
Or, y’know, just a tuple ((T, U)). But I presume that Pair is not your actual use case.
There was a time when having identically named functions was the convention for default constructors; this convention fell out of favour as time went by. It is considered bad form nowadays, probably mostly for consistency. If you have a tuple struct (or variant) Pair(T, U), then you can use Pair(first, last) in a pattern, but if you have Pair { first: T, last: U } then you would need to use something more like Pair { first, last } in a pattern, and so your Pair(first, last) function would be inconsistent with the pattern. It is generally felt, thus, that these type of camel-case functions should be reserved solely for tuple structs and tuple variants, where it can be known that it is genuinely reflecting what is contained in the data structure with no further processing or magic.
I'm attempting to write Rust bindings for a C collection library (Judy Arrays [1]) which only provides itself room to store a pointer-width value. My company has a fair amount of existing code which uses this space to directly store non-pointer values such as pointer-width integers and small structs. I'd like my Rust bindings to allow type-safe access to such collections using generics, but am having trouble getting the pointer-stashing semantics working correctly.
The mem::transmute() function seems like one potential tool for implementing the desired behavior, but attempting to use it on an instance of a parameterized type yield a confusing-to-me compilation error.
Example code:
pub struct Example<T> {
v: usize,
t: PhantomData<T>,
}
impl<T> Example<T> {
pub fn new() -> Example<T> {
Example { v: 0, t: PhantomData }
}
pub fn insert(&mut self, val: T) {
unsafe {
self.v = mem::transmute(val);
}
}
}
Resulting error:
src/lib.rs:95:22: 95:36 error: cannot transmute to or from a type that contains type parameters in its interior [E0139]
src/lib.rs:95 self.v = mem::transmute(val);
^~~~~~~~~~~~~~
Does this mean a type consisting only of a parameter "contains type parameters in its interior" and thus transmute() just won't work here? Any suggestions of the right way to do this?
(Related question, attempting to achieve the same result, but not necessarily via mem::transmute().)
[1] I'm aware of the existing rust-judy project, but it doesn't support the pointer-stashing I want, and I'm writing these new bindings largely as a learning exercise anyway.
Instead of transmuting T to usize directly, you can transmute a &T to &usize:
pub fn insert(&mut self, val: T) {
unsafe {
let usize_ref: &usize = mem::transmute(&val);
self.v = *usize_ref;
}
}
Beware that this may read from an invalid memory location if the size of T is smaller than the size of usize or if the alignment requirements differ. This could cause a segfault. You can add an assertion to prevent this:
assert_eq!(mem::size_of::<T>(), mem::size_of::<usize>());
assert!(mem::align_of::<usize>() <= mem::align_of::<T>());
I don't understand why rustc gives me this error error: use of moved value: 'f' at compile time, with the following code:
fn inner(f: &fn(&mut int)) {
let mut a = ~1;
f(a);
}
fn borrow(b: &mut int, f: &fn(&mut int)) {
f(b);
f(b); // can reuse borrowed variable
inner(f); // shouldn't f be borrowed?
// Why can't I reuse the borrowed reference to a function?
// ** error: use of moved value: `f` **
//f(b);
}
fn main() {
let mut a = ~1;
print!("{}", (*a));
borrow(a, |x: &mut int| *x+=1);
print!("{}", (*a));
}
I want to reuse the closure after I pass it as argument to another function. I am not sure if it is a copyable or a stack closure, is there a way to tell?
That snippet was for rustc 0.8. I managed to compile a different version of the code with the latest rustc (master: g67aca9c), changing the &fn(&mut int) to a plain fn(&mut int) and using normal functions instead of a closure, but how can I get this to work with a closure?
The fact of the matter is that &fn is not actually a borrowed pointer in the normal sense. It's a closure type. In master, the function types have been fixed up a lot and the syntax for such things has changed to |&mut int|—if you wanted a borrowed pointer to a function, for the present you need to type it &(fn (...)) (&fn is marked obsolete syntax for now, to help people migrating away from it, because it's a completely distinct type).
But for closures, you can then go passing them around by reference: &|&mut int|.