Web scraping: how to find first `span` by its text content - web-scraping

I'm brand new to Rust and am trying to learn about its closures.
use scraper::{ElementRef, Html, Selector};
fn findFirstSpanByTextContent<'a>(html: &'a Html, text: &'a str) -> Option<ElementRef<'a>> {
let spans = Selector::parse("span").unwrap();
let closure = |el: ElementRef| -> bool {
return el.text().next().unwrap().to_string() == text.to_string();
};
return html.select(&spans).find(closure);
}
Currently I'm getting this error:
error[E0631]: type mismatch in closure arguments
--> src/scraper.rs:40:37
|
37 | let closure = |el: ElementRef| -> bool {
| ------------------------ found signature of `for<'r> fn(scraper::ElementRef<'r>) -> _`
...
40 | return html.select(&spans).find(closure);
| ---- ^^^^^^^ expected signature of `for<'r> fn(&'r scraper::ElementRef<'_>) -> _`
| |
| required by a bound introduced by this call
What do I need to do differently?

I think kmdreko's comment was the key.
This works:
fn find_first_span_by_text_content<'a>(html: &'a Html, text: &'a str) -> Option<ElementRef<'a>> {
let spans = Selector::parse("span").unwrap();
let element_option = html.select(&spans).find(|&el| {
let first_text_option = el.text().next();
if first_text_option.is_none() {
return false;
} else {
return first_text_option.unwrap().to_string() == text.to_string();
}
});
return element_option;
}

Related

the trait cannot be made into an object in Vec [duplicate]

I wrote a program that has the trait Animal and the struct Dog implementing the trait. It also has a struct AnimalHouse storing an animal as a trait object Box<Animal>.
trait Animal {
fn speak(&self);
}
struct Dog {
name: String,
}
impl Dog {
fn new(name: &str) -> Dog {
return Dog {
name: name.to_string(),
};
}
}
impl Animal for Dog {
fn speak(&self) {
println!{"{}: ruff, ruff!", self.name};
}
}
struct AnimalHouse {
animal: Box<Animal>,
}
fn main() {
let house = AnimalHouse {
animal: Box::new(Dog::new("Bobby")),
};
house.animal.speak();
}
It returns "Bobby: ruff, ruff!" as expected, but if I try to clone house the compiler returns errors:
fn main() {
let house = AnimalHouse {
animal: Box::new(Dog::new("Bobby")),
};
let house2 = house.clone();
house2.animal.speak();
}
error[E0599]: no method named `clone` found for type `AnimalHouse` in the current scope
--> src/main.rs:31:24
|
23 | struct AnimalHouse {
| ------------------ method `clone` not found for this
...
31 | let house2 = house.clone();
| ^^^^^
|
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following trait defines an item `clone`, perhaps you need to implement it:
candidate #1: `std::clone::Clone`
I tried to add #[derive(Clone)] before struct AnimalHouse and got another error:
error[E0277]: the trait bound `Animal: std::clone::Clone` is not satisfied
--> src/main.rs:25:5
|
25 | animal: Box<Animal>,
| ^^^^^^^^^^^^^^^^^^^ the trait `std::clone::Clone` is not implemented for `Animal`
|
= note: required because of the requirements on the impl of `std::clone::Clone` for `std::boxed::Box<Animal>`
= note: required by `std::clone::Clone::clone`
How do I make the struct AnimalHouse cloneable? Is it idiomatic Rust to use a trait object actively, in general?
There are a few problems. The first is that there's nothing to require that an Animal also implements Clone. You could fix this by changing the trait definition:
trait Animal: Clone {
/* ... */
}
This would cause Animal to no longer be object safe, meaning that Box<dyn Animal> will become invalid, so that's not great.
What you can do is insert an additional step. To whit (with additions from #ChrisMorgan's comment).
trait Animal: AnimalClone {
fn speak(&self);
}
// Splitting AnimalClone into its own trait allows us to provide a blanket
// implementation for all compatible types, without having to implement the
// rest of Animal. In this case, we implement it for all types that have
// 'static lifetime (*i.e.* they don't contain non-'static pointers), and
// implement both Animal and Clone. Don't ask me how the compiler resolves
// implementing AnimalClone for dyn Animal when Animal requires AnimalClone;
// I have *no* idea why this works.
trait AnimalClone {
fn clone_box(&self) -> Box<dyn Animal>;
}
impl<T> AnimalClone for T
where
T: 'static + Animal + Clone,
{
fn clone_box(&self) -> Box<dyn Animal> {
Box::new(self.clone())
}
}
// We can now implement Clone manually by forwarding to clone_box.
impl Clone for Box<dyn Animal> {
fn clone(&self) -> Box<dyn Animal> {
self.clone_box()
}
}
#[derive(Clone)]
struct Dog {
name: String,
}
impl Dog {
fn new(name: &str) -> Dog {
Dog {
name: name.to_string(),
}
}
}
impl Animal for Dog {
fn speak(&self) {
println!("{}: ruff, ruff!", self.name);
}
}
#[derive(Clone)]
struct AnimalHouse {
animal: Box<dyn Animal>,
}
fn main() {
let house = AnimalHouse {
animal: Box::new(Dog::new("Bobby")),
};
let house2 = house.clone();
house2.animal.speak();
}
By introducing clone_box, we can get around the problems with attempting to clone a trait object.
My dyn-clone crate implements a reusable version of DK.'s answer. With it you can make your original code work with a bare minimum of changes.
One line to add DynClone as a supertrait of Animal, requiring every animal implementation to be clonable.
One line to generate an implementation of the standard library Clone for Box<dyn Animal>.
// [dependencies]
// dyn-clone = "1.0"
use dyn_clone::{clone_trait_object, DynClone};
trait Animal: DynClone {
fn speak(&self);
}
clone_trait_object!(Animal);
#[derive(Clone)]
struct Dog {
name: String,
}
impl Dog {
fn new(name: &str) -> Dog {
Dog { name: name.to_owned() }
}
}
impl Animal for Dog {
fn speak(&self) {
println!{"{}: ruff, ruff!", self.name};
}
}
#[derive(Clone)]
struct AnimalHouse {
animal: Box<dyn Animal>,
}
fn main() {
let house = AnimalHouse {
animal: Box::new(Dog::new("Bobby")),
};
let house2 = house.clone();
house2.animal.speak();
}
The previous answer correctly answers the question about storing a boxed trait object.
Getting off topic with respect to the title, but not about the idiomatic way of using trait objects, an alternative solution could be use the Rc smart pointer instead of a Box: this avoids the workaround for getting around object safety:
#[derive(Clone)]
struct AnimalHouse {
animal: Rc<Animal>,
}
fn main() {
let house = AnimalHouse { animal: Rc::new(Dog::new("Bobby")) };
let house2 = house.clone();
house2.animal.speak();
}
Note: Rc<T> is only for use in single-threaded scenarios; there's also Arc<T>.

How do I return a vector element from a Rust function?

I would like to return an element of a vector:
struct EntryOne {
pub name: String,
pub value: Option<String>,
}
struct TestVec {}
impl TestVec {
pub fn new() -> TestVec {
TestVec {}
}
pub fn findAll(&self) -> Vec<EntryOne> {
let mut ret = Vec::new();
ret.push(EntryOne {
name: "foo".to_string(),
value: Some("FooVal".to_string()),
});
ret.push(EntryOne {
name: "foo2".to_string(),
value: Some("FooVal2".to_string()),
});
ret.push(EntryOne {
name: "foo3".to_string(),
value: None,
});
ret.push(EntryOne {
name: "foo4".to_string(),
value: Some("FooVal4".to_string()),
});
ret
}
pub fn findOne(&self) -> Option<EntryOne> {
let mut list = &self.findAll();
if list.len() > 0 {
println!("{} elements found", list.len());
list.first()
} else {
None
}
}
}
fn main() {
let test = TestVec::new();
test.findAll();
test.findOne();
}
(playground)
I always get this error:
error[E0308]: mismatched types
--> src/main.rs:40:13
|
35 | pub fn findOne(&self) -> Option<EntryOne> {
| ---------------- expected `std::option::Option<EntryOne>` because of return type
...
40 | list.first()
| ^^^^^^^^^^^^ expected struct `EntryOne`, found &EntryOne
|
= note: expected type `std::option::Option<EntryOne>`
found type `std::option::Option<&EntryOne>`
How do I return an element?
Look at the signature for Vec::first:
fn first(&self) -> Option<&T>
Given a reference to a vector, it will return a reference to the first item if there is one, and None otherwise. That means that the vector containing the values must outlive the return value, otherwise the reference would point to undefined memory.
There are two main avenues:
If you cannot change the vector, then you will need to make a copy of your data structure. The easiest way to do this is to annotate the structure with #[derive(Clone)]. Then you can call Option::cloned on the result of first.
If you can change the vector, then you can remove the first value from it and return it. There are many ways of doing this, but the shortest code-wise is to use the drain iterator.
#[derive(Debug, Clone)]
struct EntryOne {
name: String,
value: Option<String>,
}
fn find_all() -> Vec<EntryOne> {
vec![
EntryOne {
name: "foo".to_string(),
value: Some("FooVal".to_string()),
},
EntryOne {
name: "foo2".to_string(),
value: Some("FooVal2".to_string()),
},
EntryOne {
name: "foo3".to_string(),
value: None,
},
EntryOne {
name: "foo4".to_string(),
value: Some("FooVal4".to_string()),
},
]
}
fn find_one_by_clone() -> Option<EntryOne> {
find_all().first().cloned()
}
fn find_one_by_drain() -> Option<EntryOne> {
let mut all = find_all();
let mut i = all.drain(0..1);
i.next()
}
fn main() {
println!("{:?}", find_one_by_clone());
println!("{:?}", find_one_by_drain());
}
Additional changes:
There's no need for TestVec if there's no state; just make functions.
Rust style is snake_case for method and variable names.
Use vec! to construct a vector when providing all the elements.
Derive Debug so you can print the value.
If you wanted to always get the last element, you can use pop:
fn find_one_by_pop() -> Option<EntryOne> {
find_all().pop()
}

Type name T undefined when creating a mutable binary tree

I am new to Rust, and for an exercise, I am building a simple generic binary tree. This is how I'd create one in C++
template<typename T>
struct Node
{
T data;
Node<T>* parent;
Node<T>* left;
Node<T>* right;
};
template<typename T>
struct Bintree
{
Node<T>* root;
};
But the same(ish) code in Rust doesn't seem to work:
use std::ptr;
struct Node<T> {
data: T,
left: &Node<T>,
right: &Node<T>,
parent: &Node<T>,
}
struct Tree<T> {
root: &Node<T>,
}
impl Tree<T> {
pub fn new() -> Tree<T> {
Tree { root: ptr::null() }
}
pub fn insert(&self, value: T) {
if root.is_null() {
self.root = Node {
data: value,
left: ptr::null(),
right: ptr::null(),
parent: ptr::null(),
};
}
}
}
fn main() {
println!("Hello, world!");
}
And here's the error:
error[E0412]: type name `T` is undefined or not in scope
--> src/main.rs:14:15
|
14 | impl Tree<T> {
| ^ undefined or not in scope
|
= help: no candidates by the name of `T` found in your project; maybe you misspelled the name or forgot to import an external crate?
error[E0412]: type name `T` is undefined or not in scope
--> src/main.rs:15:30
|
15 | pub fn new() -> Tree<T> {
| ^ undefined or not in scope
|
= help: no candidates by the name of `T` found in your project; maybe you misspelled the name or forgot to import an external crate?
error[E0412]: type name `T` is undefined or not in scope
--> src/main.rs:19:37
|
19 | pub fn insert(&self, value: T) {
| ^ undefined or not in scope
|
= help: no candidates by the name of `T` found in your project; maybe you misspelled the name or forgot to import an external crate?
error[E0425]: unresolved name `root`. Did you mean `self.root`?
--> src/main.rs:20:16
|
20 | if root.is_null() {
| ^^^^
error[E0106]: missing lifetime specifier
--> src/main.rs:5:15
|
5 | left: &Node<T>,
| ^ expected lifetime parameter
error[E0106]: missing lifetime specifier
--> src/main.rs:6:16
|
6 | right: &Node<T>,
| ^ expected lifetime parameter
error[E0106]: missing lifetime specifier
--> src/main.rs:7:17
|
7 | parent: &Node<T>,
| ^ expected lifetime parameter
error[E0106]: missing lifetime specifier
--> src/main.rs:11:15
|
11 | root: &Node<T>,
| ^ expected lifetime parameter
I don't really understand what's wrong with that. I don't really get how Rust's pointers work.
In this case, you have a basic syntax error, it should be
impl<T> Tree<T>
From there, you'll see that you need if self.root.is_null().
Then, the data structure needs lifetime specifiers, since you are using references. Using the most straightforward of that syntax eventually leads to
error[E0309]: the parameter type `T` may not live long enough
So you use T: 'a there... and you end up with:
use std::ptr;
struct Node<'a, T: 'a> {
data: T,
left: &'a Node<'a, T>,
right: &'a Node<'a, T>,
parent: &'a Node<'a, T>,
}
struct Tree<'a, T: 'a> {
root: &'a Node<'a, T>,
}
impl<'a, T> Tree<'a, T> {
pub fn new() -> Tree<'a, T> {
Tree { root: ptr::null() }
}
pub fn insert(&self, value: T) {
if self.root.is_null() {
self.root = Node {
data: value,
left: ptr::null(),
right: ptr::null(),
parent: ptr::null(),
};
}
}
}
fn main() {
println!("Hello, world!");
}
This gives another error
21 | root: ptr::null(),
| ^^^^^^^^^^^ expected reference, found *-ptr
This is because ptr::null() returns raw pointers, but you've declared that your data structure uses references.
Okay, that's as far as I'm going to go. Let's go back to your question...
I am new to Rust, and for an exercise, I am building a simple generic binary tree.
I would suggest that you should consider something other than writing a data structure. They're not simple in Rust. If you still want to do this approach, may I recommend Too Many Lists.
I finally found a way to do it. I used std::optional instead of std::ptr for the node struct, and it works like a C pointer.
struct Node<T> {
id: u32,
data: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
parent: Option<Box<Node<T>>>,
}
struct Tree<T> {
root: Option<Node<T>>,
}
impl<T> Node<T> {
pub fn new(value: Option<T>,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
parent: Option<Box<Node<T>>>)
-> Node<T> {
Node {
data: value.unwrap(),
left: left,
right: right,
parent: parent,
}
}
}
impl<T> Tree<T> {
pub fn new() -> Tree<T> {
Tree { root: None }
}
pub fn insert(&mut self, value: T) {
match self.root {
Some(ref n) => {
println!("Root is not empty");
}
None => {
println!("Root is empty");
self.root = Some(Node::new(Some(value), None, None, None));
}
}
}
}
fn main() {
println!("Hello, world!");
let mut tree: Tree<i32> = Tree::new();
tree.insert(42);
}

Use of collaterally moved value error on a recursive enum

I have a recursive Item structure that I am using to implement lists:
#[derive(Debug)]
pub enum Item<T> {
Cons(T, Box<Item<T>>),
Nil,
}
When implementing a function that inserts an element after another one, I found out that the Rust compiler wasn't that happy about my code:
pub fn add_after<T>(it: Box<Item<T>>, val: T) -> Box<Item<T>> {
match *it {
Item::Nil => return it,
Item::Cons(a, b) => {
let itm = Box::new(Item::Cons(val, b));
return Box::new(Item::Cons(a, itm));
}
}
}
The errors that I get are pretty obscure for a newbie:
error[E0382]: use of collaterally moved value: `(it as Item::Cons).1`
--> src/main.rs:12:23
|
12 | Item::Cons(a, b) => {
| - ^ value used here after move
| |
| value moved here
|
= note: move occurs because the value has type `T`, which does not implement the `Copy` trait
Another similar question suggested to do the unwrapping phase in two steps but it cannot be used here because we need to directly unwrap a two-fields Cons(..) item and not nested items like Option<Box<Whatever>> where the two-phase trick can be applied. Example of what I tried:
pub fn add_after<T>(it: Box<Item<T>>, val: T) -> Box<Item<T>> {
match *it {
Item::Nil => return it,
Item::Cons(..) => {
let Item::Cons(a, b) = *it;
let itm = Box::new(Item::Cons(val, b));
return Box::new(Item::Cons(a, itm));
}
}
}
But I get another error:
error[E0005]: refutable pattern in local binding: `Nil` not covered
--> src/main.rs:13:17
|
13 | let Item::Cons(a, b) = *it;
| ^^^^^^^^^^^^^^^^ pattern `Nil` not covered
Though I am pretty sure here that this is exhaustive at this point because we matched a Cons before.
You may be suffering from issue 16223 (see also 22205 which has a closer reproduction), although today's non-lexical lifetimes don't solve this problem. This seems to preclude destructuring multiple things through a Box.
Here's one way to work around it, although it's not the most efficient way as it deallocates and reallocates unnecessarily:
#[derive(Debug)]
pub enum Item<T> {
Cons(T, Box<Item<T>>),
Nil,
}
pub fn add_after<T>(it: Box<Item<T>>, val: T) -> Box<Item<T>> {
match { *it } {
Item::Nil => Box::new(Item::Nil),
Item::Cons(a, b) => {
let itm = Box::new(Item::Cons(val, b));
Box::new(Item::Cons(a, itm))
}
}
}
fn main() {}
A more verbose way pulls the value out of the Box, manipulates that, and then puts the manipulated value back into the Box. This should have a reduced amount of allocations:
use std::mem;
pub fn add_after<T>(mut item: Box<Item<T>>, val: T) -> Box<Item<T>> {
let unboxed_value = mem::replace(&mut *item, Item::Nil);
match unboxed_value {
Item::Nil => item,
Item::Cons(a, b) => {
let itm = Box::new(Item::Cons(val, b));
*item = Item::Cons(a, itm);
item
}
}
}
See also:
Collaterally moved error when deconstructing a Box of pairs

How do I pass a reference to mutable data in Rust?

I want to create a mutable struct on the stack and mutate it from helper functions.
#[derive(Debug)]
struct Game {
score: u32,
}
fn addPoint(game: &mut Game) {
game.score += 1;
}
fn main() {
let mut game = Game { score: 0 };
println!("Initial game: {:?}", game);
// This works:
game.score += 1;
// This gives a compile error:
addPoint(&game);
println!("Final game: {:?}", game);
}
Trying to compile this gives:
error[E0308]: mismatched types
--> src/main.rs:19:14
|
19 | addPoint(&game);
| ^^^^^ types differ in mutability
|
= note: expected type `&mut Game`
found type `&Game`
What am I doing wrong?
The reference needs to be marked as mutable too:
addPoint(&mut game);

Resources