I wanted to write a simple implementation of Peano numbers in Rust and it seems that I managed to get the basics working:
use self::Peano::*;
use std::ops::Add;
#[derive(Debug, PartialEq)]
enum Peano {
Zero,
Succ(Box<Peano>)
}
impl Add for Peano {
type Output = Peano;
fn add(self, other: Peano) -> Peano {
match other {
Zero => self,
Succ(x) => Succ(Box::new(self + *x))
}
}
}
fn main() {
assert_eq!(Zero + Zero, Zero);
assert_eq!(Succ(Box::new(Zero)) + Zero, Succ(Box::new(Zero)));
assert_eq!(Zero + Succ(Box::new(Zero)), Succ(Box::new(Zero)));
assert_eq!(Succ(Box::new(Zero)) + Succ(Box::new(Zero)), Succ(Box::new(Succ(Box::new(Zero)))));
assert_eq!(Succ(Box::new(Zero)) + Zero + Succ(Box::new(Zero)), Succ(Box::new(Succ(Box::new(Zero)))));
}
However, when I decided to take look at how it was implemented by others, I saw that noone decided to do it with an enum, but rather with structs and PhantomData (example 1, example 2).
Is there something wrong with my implementation? Is this because Zero and Succ are enum variants and not true types (so my implementation is not actual type arithmetic)? Or is it just preferable to do this the "mainstream" way due to difficulties that would occur if I expanded my implementation?
Edit: my struggles with implementing Peano numbers using structs can be seen here.
Your Peano numbers are on the level of values, which are used for computation when the program is running. This is fine for playing around but not very useful, because binary numbers like i32 are much more efficient.
The other implementations represent Peano numbers on the type level, where you currently can't use ordinary numbers. This allows to express types that depend on a number, for example arrays of a fixed size. The computations then take place when the compiler is inferring types.
Related
I encountered this competitive programming problem:
nums is a vector of integers (length n)
ops is a vector of strings containing + and - (length n-1)
It can be solved with the reduce operation in Kotlin like this:
val op_iter = ops.iterator();
nums.reduce {a, b ->
when (op_iter.next()) {
"+" -> a+b
"-" -> a-b
else -> throw Exception()
}
}
reduce is described as:
Accumulates value starting with the first element and applying operation from left to right to current accumulator value and each element.
It looks like Rust vectors do not have a reduce method. How would you achieve this task?
Edited: since Rust version 1.51.0, this function is called reduce
Be aware of similar function which is called fold. The difference is that reduce will produce None if iterator is empty while fold accepts accumulator and will produce accumulator's value if iterator is empty.
Outdated answer is left to capture the history of this function debating how to name it:
There is no reduce in Rust 1.48. In many cases you can simulate it with fold but be aware that the semantics of the these functions are different. If the iterator is empty, fold will return the initial value whereas reduce returns None. If you want to perform multiplication operation on all elements, for example, getting result 1 for empty set is not too logical.
Rust does have a fold_first function which is equivalent to Kotlin's reduce, but it is not stable yet. The main discussion is about naming it. It is a safe bet to use it if you are ok with nightly Rust because there is little chance the function will be removed. In the worst case, the name will be changed. If you need stable Rust, then use fold if you are Ok with an illogical result for empty sets. If not, then you'll have to implement it, or find a crate such as reduce.
Kotlin's reduce takes the first item of the iterator for the starting point while Rust's fold and try_fold let you specify a custom starting point.
Here is an equivalent of the Kotlin code:
let mut it = nums.iter().cloned();
let start = it.next().unwrap();
it.zip(ops.iter()).try_fold(start, |a, (b, op)| match op {
'+' => Ok(a + b),
'-' => Ok(a - b),
_ => Err(()),
})
Playground
Or since we're starting from a vector, which can be indexed:
nums[1..]
.iter()
.zip(ops.iter())
.try_fold(nums[0], |a, (b, op)| match op {
'+' => Ok(a + b),
'-' => Ok(a - b),
_ => Err(()),
});
Playground
I came across this while doing the 2018 Advent of Code (Day 2, Part 1) solution in Rust.
The problem to solve:
Take the count of strings that have exactly two of the same letter, multiplied by the count of strings that have exactly three of the same letter.
INPUT
abcdega
hihklmh
abqasbb
aaaabcd
The first string abcdega has a repeated twice.
The second string hihklmh has h repeated three times.
The third string abqasbb has a repeated twice, and b repeated three times, so it counts for both.
The fourth string aaaabcd contains a letter repeated 4 times (not 2, or 3) so it does not count.
So the result should be:
2 strings that contained a double letter (first and third) multiplied by 2 strings that contained a triple letter (second and third) = 4
The Question:
const PUZZLE_INPUT: &str =
"
abcdega
hihklmh
abqasbb
aaaabcd
";
fn letter_counts(id: &str) -> [u8;26] {
id.chars().map(|c| c as u8).fold([0;26], |mut counts, c| {
counts[usize::from(c - b'a')] += 1;
counts
})
}
fn has_repeated_letter(n: u8, letter_counts: &[u8;26]) -> bool {
letter_counts.iter().any(|&count| count == n)
}
fn main() {
let ids_iter = PUZZLE_INPUT.lines().map(letter_counts);
let num_ids_with_double = ids_iter.clone().filter(|id| has_repeated_letter(2, id)).count();
let num_ids_with_triple = ids_iter.filter(|id| has_repeated_letter(3, id)).count();
println!("{}", num_ids_with_double * num_ids_with_triple);
}
Rust Playground
Consider line 21. The function letter_counts takes only one argument, so I can use the syntax: .map(letter_counts) on elements that match the type of the expected argument. This is really nice to me! I love that I don't have to create a closure: .map(|id| letter_counts(id)). I find both to be readable, but the former version without the closure is much cleaner to me.
Now consider lines 22 and 23. Here, I have to use the syntax: .filter(|id| has_repeated_letter(3, id)) because the has_repeated_letter function takes two arguments. I would really like to do .filter(has_repeated_letter(3)) instead.
Sure, I could make the function take a tuple instead, map to a tuple and consume only a single argument... but that seems like a terrible solution. I'd rather just create the closure.
Leaving out the only argument is something that Rust lets you do. Why would it be any harder for the compiler to let you leave out the last argument, provided that it has all of the other n-1 arguments for a function that takes n arguments.
I feel like this would make the syntax a lot cleaner, and it would fit in a lot better with the idiomatic functional style that Rust prefers.
I am certainly no expert in compilers, but implementing this behavior seems like it would be straightforward. If my thinking is incorrect, I would love to know more about why that is so.
No, you cannot pass a function with multiple arguments as an implicit closure.
In certain cases, you can choose to use currying to reduce the arity of a function. For example, here we reduce the add function from 2 arguments to one:
fn add(a: i32, b: i32) -> i32 {
a + b
}
fn curry<A1, A2, R>(f: impl FnOnce(A1, A2) -> R, a1: A1) -> impl FnOnce(A2) -> R {
move |a2| f(a1, a2)
}
fn main() {
let a = Some(1);
a.map(curry(add, 2));
}
However, I agree with the comments that this isn't a benefit:
It's not any less typing:
a.map(curry(add, 2));
a.map(|v| add(v, 2));
The curry function is extremely limited: it chooses to use FnOnce, but Fn and FnMut also have use cases. It only applies to a function with two arguments.
However, I have used this higher-order function trick in other projects, where the amount of code that is added is much greater.
In dynamic languages, like Clojure, it is easy to express collections with different types:
{:key1 "foo", :key2 [34 "bar" 4.5], "key3" {:key4 "foobar"}}
In Rust, the preferred way to implement such collections is using trait objects or enums. The use of the Any trait object seems to be the most flexible approach (if there isn't a fixed number of known type alternatives) as it allows downcasting to the actual object types:
let mut vector: Vec<Box<Any>> = Vec::new();
vector.push(Box::new("I’m"));
vector.push(Box::new(4 as u32));
console!(log, vector[0].downcast_ref::<&str>());
console!(log, vector[1].downcast_ref::<u32>());
This approach seems to be discouraged. What are its disadvantages?
What are its disadvantages?
The main disadvantage is that, when you access a value from a collection of &Any, the only thing you can do with it is downcast it to a specific, known type. If there is a type you don't know about, then the values of that type are completely opaque: you can literally do nothing with them except count how many there are. If you know the concrete type then you can downcast to it, but you would need to try each possible type that it could be.
Here's an example:
let a = 1u32;
let b = 2.0f32;
let c = "hello";
let v: Vec<&dyn Any> = vec![&a, &b, &c];
match v[0].downcast_ref::<u32>() {
Some(x) => println!("u32: {:?}", x),
None => println!("Not a u32!"),
}
Notice that I had to explicitly downcast to a u32. Using this approach would involve a logical branch for every possible concrete type, and no compiler warnings if I had forgotten a case.
Trait objects are more versatile because you don't need to know the concrete types in order to use the values - as long as you stick to only using methods of the trait.
For example:
let v: Vec<&dyn Debug> = vec![&a, &b, &c];
println!("u32: {:?}", v[0]); // 1
println!("u32: {:?}", v[1]); // 2.0
println!("u32: {:?}", v[2]); // "hello"
I was able to use all of the values without knowing their concrete types, because I only used the fact that they implement Debug.
Both of those approaches have a downside over using a concrete type in a homogeneous collection: everything is hidden behind pointers. Accessing the data is always indirect, and that data could end up being spread around in memory, making it much less efficient to access and harder for the compiler to optimize.
Making the collection homogeneous with an enum looks like this:
enum Item<'a> {
U32(u32),
F32(f32),
Str(&'a str),
}
let v: Vec<Item> = vec![Item::U32(a), Item::F32(b), Item::Str(c)];
match v[0] {
Item::U32(x) => println!("u32: {:?}", x),
Item::F32(x) => println!("u32: {:?}", x),
Item::Str(x) => println!("u32: {:?}", x),
}
Here, I still have to know all of the types, but at least there would be a compiler warning if I missed one. Also notice that the enum can own its values, so (apart from the &str in this case) the data can be tightly packed in memory, making it faster to access.
In summary, Any is rarely the right answer for a heterogeneous collection, but both trait objects and enums have their own trade-offs.
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|c| c == 32u8).count()
}
The above function does not compile, and gives the following error on the comparison:
trait `&u8: std::cmp::PartialEq` not satisfied
I read this as: "c is a borrowed byte and cannot be compared to a regular byte", but I must be reading this wrong.
What would be the appropriate way to split a Vec<u8> on specific values?
I do realize that there are options when reading files, like splitting a BufReader or I could convert the vector to a string and use str::split. I might go with such a solution (passing in a BufReader instead of a Vec<u8>), but right now I'm just playing around, testing stuff and want to know what I'm doing wrong.
The code
You are actually reading it right: c is indeed a borrowed byte and cannot be compared to a regular byte. Try using any of the functions below instead:
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|&c| c == 32u8).count()
}
fn count_spaces(text: Vec<u8>) -> usize {
text.split(|c| *c == 32u8).count()
}
The first one uses pattern matching on the parameter (&c) to dereference it, while the second one uses the dereference operator (*).
Why is c a &u8 instead of a u8?
If you take a look at the split method on the docs, you will see that the closure parameter is a borrow of the data in Vec. In this case, it means that the parameter will be &u8 instead of u8 (so in your code you are actually comparing &u8 to u8, which Rust doesn't like).
In order to understand why the closure takes the parameter by borrow and not by value, consider what would happen if the parameter was taken by value. In the case of Vec<u8>, there would be no problem since u8 implements Copy. However, in the case of a a Vec<String>, each String would be moved into the closure and destroyed!
I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want