Why does a node in a linked list using raw pointers become corrupted? - pointers

I am struggling to learn raw pointers while implementing a linked list. A simple piece of code gives me unintended results for which I struggle to find any explanation whatsoever:
use std::cmp::PartialEq;
use std::default::Default;
use std::ptr;
pub struct LinkedListElement<T> {
pub data: T,
pub next: *mut LinkedListElement<T>,
}
pub struct LinkedList<T> {
head: *mut LinkedListElement<T>,
}
impl<T: PartialEq> LinkedListElement<T> {
pub fn new(elem: T, next: Option<*mut LinkedListElement<T>>) -> LinkedListElement<T> {
let mut_ptr = match next {
Some(t) => t,
None => ptr::null_mut(),
};
let new_elem = LinkedListElement {
data: elem,
next: mut_ptr,
};
if !mut_ptr.is_null() {
println!(
"post create ll mut ptr: {:p}, post create ll mut ptr next {:p}",
mut_ptr,
unsafe { (*mut_ptr).next }
);
}
new_elem
}
}
impl<T: PartialEq + Default> LinkedList<T> {
pub fn new(elem: T) -> LinkedList<T> {
LinkedList {
head: &mut LinkedListElement::new(elem, None),
}
}
pub fn insert(&mut self, elem: T) {
println!("head: {:p} . next: {:p}", self.head, unsafe {
(*self.head).next
});
let next = Some(self.head);
let mut ll_elem = LinkedListElement::new(elem, next);
println!(
"before pointer head: {:p}. before pointer next {:p}",
self.head,
unsafe { (*self.head).next }
);
let ll_elem_ptr = &mut ll_elem as *mut LinkedListElement<T>;
self.head = ll_elem_ptr;
}
}
fn main() {
let elem: i32 = 32;
let second_elem: i32 = 64;
let third_elem: i32 = 72;
let mut list = LinkedList::new(elem);
list.insert(second_elem);
list.insert(third_elem);
}
(playground)
This code gives me the following output:
head: 0x7ffe163275e8 . next: 0x0
post create ll mut ptr: 0x7ffe163275e8, post create ll mut ptr next 0x0
before pointer head: 0x7ffe163275e8. before pointer next 0x0
head: 0x7ffe16327560 . next: 0x7ffe163275e8
post create ll mut ptr: 0x7ffe16327560, post create ll mut ptr next 0x7ffe163275e8
before pointer head: 0x7ffe16327560. before pointer next 0x7ffe16327560
For the first 2 elements the code behaves as expected: it creates an element with null pointer as its next element. Here is the state of things after adding second element:
{
head: {
elem: 64,
next: {
elem: 32,
next: nullptr
}
}
}
64 -> 32 -> null
When the third element is added, things become weird and the linked list transforms into something like this:
{
head: {
elem: 72,
next: {
elem: 72,
next: {
elem: 72,
next: ...
}
}
}
}
72 -> 72 -> 72 -> ...
It seems that the linked list element's next field starts pointing at the element itself.
I have debugged the LinkedListElement::new method and found that the proper element should get returned from it:
{
elem: 72,
next: {
elem: 64,
next: {
elem: 32,
next: nullptr
}
}
}
For some reason, immediately after it is returned to LinkedList::insert method, even before self.head is reassigned, the contents of LinkedList self becomes "corrupted".
I know using raw pointers in Rust is not idiomatic but I still want to learn them.

Congratulations, you have successfully proven why Rust needs to exist in the first place: programmers write memory-unsafe code.
First, please read why this is disallowed when using safe Rust:
Is there any way to return a reference to a variable created in a function?
TL;DR: the memory address of LinkedListElement changes when it's moved. A move occurs when a value is returned from a function (among other times). By using a raw pointer, you've subverted the borrow checker and get no useful feedback from the compiler.
Second, please read Learning Rust With Entirely Too Many Linked Lists. For whatever reason, programmers think that linked lists are "easy" and a good way to learn a language. This is generally not true in Rust, where memory safety is paramount.
TL;DR: you can use a Box to allocate memory on the heap. This memory address will not change when the pointer to it is moved. You will need to ensure that you appropriately free the pointer when your linked list goes out of scope to prevent memory leaks.
See also:
How to copy a raw pointer when implementing a linked list in Rust?
Box::into_raw / Box::from_raw
NonNull

Related

Parallel Recursion Fix

Quite new to Rust and trying to tackle toy problems. Trying to write a directory traversal with only Rayon.
struct Node {
path: PathBuf,
files: Vec<PathBuf>,
hashes: Vec<String>,
folders: Vec<Box<Node>>,
}
impl Node {
pub fn new(path: PathBuf) -> Self {
Node {
path: path,
files: Vec::new(),
hashes: Vec::new(),
folders: Vec::new(),
}
}
pub fn burrow(&mut self) {
let mut contents: Vec<PathBuf> = ls_dir(&self.path);
contents.par_iter().for_each(|item|
if item.is_file() {
self.files.push(*item);
} else if item.is_dir() {
let mut new_folder = Node::new(*item);
new_folder.burrow();
self.folders.push(Box::new(new_folder));
});
}
}
The errors I am receiving are
error[E0596]: cannot borrow `*self.files` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:40:37
|
40 | ... self.files.push(*item);
| ^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
error[E0507]: cannot move out of `*item` which is behind a shared reference
--> src/main.rs:40:53
|
40 | ... self.files.push(*item);
| ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait
error[E0507]: cannot move out of `*item` which is behind a shared reference
--> src/main.rs:42:68
|
42 | ... let mut new_folder = Node::new(*item);
| ^^^^^ move occurs because `*item` has type `PathBuf`, which does not implement the `Copy` trait
error[E0596]: cannot borrow `*self.folders` as mutable, as it is a captured variable in a `Fn` closure
--> src/main.rs:44:37
|
44 | ... self.folders.push(Box::new(new_folder));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
The errors are clear in that they are preventing different threads from accessing mutable memory, but I'm just not sure how to start to address the errors.
Below is the original (non-parallel) version of burrow
pub fn burrow(&mut self) {
let mut contents: Vec<PathBuf> = ls_dir(&self.path);
for item in contents {
if item.is_file() {
self.files.push(item);
} else if item.is_dir() {
let mut new_folder = Node::new(item);
new_folder.burrow();
self.folders.push(Box::new(new_folder));
}
}
}
The best option in this case is to use ParallelIterator::partition_map() which allows you to turn a parallel iterator into two different collections according to some condition, which is exactly what you need to do.
Example program:
use rayon::iter::{Either, IntoParallelIterator, ParallelIterator};
fn main() {
let input = vec!["a", "bb", "c", "dd"];
let (chars, strings): (Vec<char>, Vec<&str>) =
input.into_par_iter().partition_map(|s| {
if s.len() == 1 {
Either::Left(s.chars().next().unwrap())
} else {
Either::Right(s)
}
});
dbg!(chars, strings);
}
If you had three different outputs, unfortunately Rayon does not support that. I haven't looked at whether it'd be possible to build using Rayon's traits, but what I would suggest as a more general (though not quite as efficient) solution is to use channels. A channel like std::sync::mpsc allows any number of threads to insert items while another thread removes them — in your case, to move them into a collection. This would not be quite as efficient as parallel collection, but in an IO-dominated problem like yours, it would not be significant.
I'm going to skip the separation of files and folders, ignore the structure, and demonstrate a simple recursive approach that gets all the files in a directory recursively:
fn burrow(dir: &Path) -> Vec<PathBuf> {
let mut contents = vec![];
for entry in std::fs::read_dir(dir).unwrap() {
let entry = entry.unwrap().path();
if entry.is_dir() {
contents.extend(burrow(&entry));
} else {
contents.push(entry);
}
}
contents
}
The first step if you want to use the parallel iterators from rayon, is to convert this loop into a non-parallel iterator chain. The best way to do that is with .flat_map() to flatten results that yield more than one element:
fn burrow(dir: &Path) -> Vec<PathBuf> {
std::fs::read_dir(dir)
.unwrap()
.flat_map(|entry| {
let entry = entry.unwrap().path();
if entry.is_dir() {
burrow(&entry)
} else {
vec![entry] // use a single-element Vec if not a directory
}
})
.collect()
}
Then to use rayon to process this iteration in parallel is to use .par_bridge() to convert an iterator into a parallel iterator. And that's it actually:
use rayon::iter::{ParallelBridge, ParallelIterator};
fn burrow(dir: &Path) -> Vec<PathBuf> {
std::fs::read_dir(dir)
.unwrap()
.par_bridge()
.flat_map(|entry| {
let entry = entry.unwrap().path();
if entry.is_dir() {
burrow(&entry)
} else {
vec![entry]
}
})
.collect()
}
See it working on the playground. You can extend on this to collect more complex results (like folders and hashes and whatever else).

What is the most efficient way to return/move a Vec/Field in rust while also emptying it? [duplicate]

I have a struct with a field:
struct A {
field: SomeType,
}
Given a &mut A, how can I move the value of field and swap in a new value?
fn foo(a: &mut A) {
let mut my_local_var = a.field;
a.field = SomeType::new();
// ...
// do things with my_local_var
// some operations may modify the NEW field's value as well.
}
The end goal would be the equivalent of a get_and_set() operation. I'm not worried about concurrency in this case.
Use std::mem::swap().
fn foo(a: &mut A) {
let mut my_local_var = SomeType::new();
mem::swap(&mut a.field, &mut my_local_var);
}
Or std::mem::replace().
fn foo(a: &mut A) {
let mut my_local_var = mem::replace(&mut a.field, SomeType::new());
}
If your type implements Default, you can use std::mem::take:
#[derive(Default)]
struct SomeType;
fn foo(a: &mut A) {
let mut my_local_var = std::mem::take(&mut a.field);
}
If your field happens to be an Option, there's a specific method you can use — Option::take:
struct A {
field: Option<SomeType>,
}
fn foo(a: &mut A) {
let old = a.field.take();
// a.field is now None, old is whatever a.field used to be
}
The implementation of Option::take uses mem::take, just like the more generic answer above shows, but it is wrapped up nicely for you:
pub fn take(&mut self) -> Option<T> {
mem::take(self)
}
See also:
Temporarily move out of borrowed content
Change enum variant while moving the field to the new variant

How do you replace the value of a mutable variable by taking ownership of it?

I am working with a LinkedList and I want to remove all elements which do not pass a test. However, I am running into the error cannot move out of borrowed content.
From what I understand, this is because I am working with &mut self, so I do not have the right to invalidate (i.e. move) one of the contained values even for a moment to construct a new list of its values.
In C++/Java, I would simply iterate the list and remove any elements which match a criteria. As there is no remove that I have yet found, I have interpreted it as an iterate, filter, and collect.
The goal is to avoid creating a temporary list, cloning values, and needing take self and return a "new" object. I have constructed an example which produces the same error. Playground.
use std::collections::LinkedList;
#[derive(Debug)]
struct Example {
list: LinkedList<i8>,
// Other stuff here
}
impl Example {
pub fn default() -> Example {
let mut list = LinkedList::new();
list.push_back(-5);
list.push_back(3);
list.push_back(-1);
list.push_back(6);
Example { list }
}
// Simmilar idea, but with creating a new list
pub fn get_positive(&self) -> LinkedList<i8> {
self.list.iter()
.filter(|&&x| x > 0)
.map(|x| x.clone())
.collect()
}
// Now, attempt to filter the elements without cloning anything
pub fn remove_negative(&mut self) {
self.list = self.list.into_iter()
.filter(|&x| x > 0)
.collect()
}
}
fn main() {
let mut e = Example::default();
println!("{:?}", e.get_positive());
println!("{:?}", e);
}
In my actual case, I cannot simply consume the wrapping object because it needs to be referenced from different places and contains other important values.
In my research, I found some unsafe code which leads me to question if a safe function could be constructed to perform this action in a similar way to std::mem::replace.
You can std::mem::swap your field with a temp, and then replace it with your modified list like this. The big downside is the creation of the new LinkedList. I don't know how expensive that is.
pub fn remove_negative(&mut self) {
let mut temp = LinkedList::new();
std::mem::swap(&mut temp, &mut self.list);
self.list = temp.into_iter()
.filter(|&x| x > 0)
.collect();
}
If the goal is not clone you may use a reference-counting pointer: the clone method on Rc increments the reference counter.
use std::collections::LinkedList;
use std::rc::Rc;
#[derive(Debug)]
struct Example {
list: LinkedList<Rc<i8>>,
// ...
}
impl Example {
pub fn default() -> Example {
let mut list = LinkedList::new();
list.push_back(Rc::new(-5));
list.push_back(Rc::new(3));
list.push_back(Rc::new(-1));
list.push_back(Rc::new(6));
Example { list }
}
// Simmilar idea, but with creating a new list
pub fn get_positive(&self) -> LinkedList<Rc<i8>> {
self.list.iter()
.filter(|&x| x.as_ref() > &0)
.map(|x| x.clone())
.collect()
}
// Now, attempt to filter the elements without cloning anything
pub fn remove_negative(&mut self) {
self.list = self.list.iter()
.filter(|&x| x.as_ref() > &0)
.map(|x| x.clone())
.collect()
}
}
fn main() {
let mut e = Example::default();
e.remove_negative();
println!("{:?}", e.get_positive());
println!("{:?}", e);
}

Unable to return a vector of string slices: borrowed value does not live long enough

I'm new to Rust and I'm having some trouble with the borrow checker. I don't understand why this code won't compile. Sorry if this is close to a previously answered question but I can't seem to find a solution in the other questions I've looked at.
I understand the similarity to Return local String as a slice (&str) but in that case it is just one string being returned and not enough for me to reason with my code in which I am trying to return a vector. From what I understand, I am trying to return references to str types that will go out of scope at the end of the function block and so should I be mapping that vector of &str into a vector of String? I am not so concerned about the performance effects of converting &str to String. First I'd just like to get it working.
This is the code, the error is in the lex function.
use std::io::prelude::*;
use std::fs::File;
use std::env;
fn open(mut s: &mut String, filename: &String) {
let mut f = match File::open(&filename) {
Err(_) => panic!("Couldn't open file"),
Ok(file) => file,
};
match f.read_to_string(&mut s) {
Err(_) => panic!("Couldn't read file"),
Ok(_) => println!("File read successfully"),
};
}
fn lex(s: &String) -> Vec<&str> {
let token_string: String = s.replace("(", " ( ")
.replace(")", " ) ");
let token_list: Vec<&str> = token_string.split_whitespace()
.collect();
token_list
}
fn main() {
let args: Vec<_> = env::args().collect();
if args.len() < 2 {
panic!("Please provide a filename");
} else {
let ref filename = args[1];
let mut s = String::new();
open(&mut s, filename);
let token_list: Vec<&str> = lex(&s);
println!("{:?}", token_list);
}
}
Here is the error message
error: borrowed value does not live long enough
self.0.borrow().values.get(idx)
^~~~~~~~~~~~~~~
reference must be valid for the anonymous lifetime #1 defined on the block at 23:54...
pub fn value(&self, idx: usize) -> Option<&Value> {
^
note: ...but borrowed value is only valid for the block at 23:54
pub fn value(&self, idx: usize) -> Option<&Value> {
^
I'm finding it hard to reason with this code because with my level of experience with Rust I can't visualise the lifetimes of these variables. Any help would be appreciated as I've spent an hour or two trying to figure this out.
The problem is that you're allocating a new String (token_string) inside the lex function and then returning an array of references to it, but token_string will get dropped (and the memory freed) as soon as it falls out of scope at the end of the function.
fn lex(s: &String) -> Vec<&str> {
let token_string: String = s.replace("(", " ( ") // <-- new String allocated
.replace(")", " ) ");
let token_list: Vec<&str> = token_string.split_whitespace()
.collect();
token_list // <-- this is just an array of wide pointers into token_string
} // <-- token_string gets freed here, so the returned pointers
// would be pointing to memory that's already been dropped!
There's a couple of ways to address this. One would be to force the caller of lex to pass in the buffer that you want to use to collect into. This would change the signature to fn lex<'a>(input: &String, buffer: &'a mut String) -> Vec<&'a str> This signature would specify that the lifetimes of the returned &strs will be at least as long as the lifetime of the buffer that's passed in.
Another way would be to just return a Vec<String> instead of Vec<&str> if you can tolerate the extra allocations.

Avoid partially moved value when taking ownership of a recursive data structure?

Say I have a recursive data structure like a singly-linked list, and I want to write a recursive function to insert a value after the last node*:
struct Node {
next: Option<Box<Node>>,
val: i32,
}
fn put_after_node(maybe_node: Option<Box<Node>>, val: i32) -> Box<Node> {
match maybe_node {
None => Box::new(Node { next: None, val: val }),
Some(mut node) => {
// compile error on next line: use of partially moved value: `node`
node.next = Some(put_after_node(node.next, val));
node
}
}
}
Q: How do I avoid the compile error complaining that node has been partially moved?
Failed fix #1: Avoiding taking ownership of the function's arguments, by taking maybe_node: &mut Option<Box<Node>> instead. Failed because I need to add a new node and pass that back up the stack, and if I only have a mutable reference then I need to dereference it, which causes an illegal move out of borrowed value:
fn put_after_node(maybe_node: &mut Option<Box<Node>>, val: i32) -> Box<Node> {
match maybe_node {
&mut None => Box::new(Node { next: None, val: val }),
&mut Some(ref mut node) => {
node.next = Some(put_after_node(&mut node.next, val));
*node // compile error: cannot move out of borrowed content
}
}
}
Failed fix #2: Return a reference to a new node instead (fn ... -> &Box<Node>). Failed because the new node doesn't live long enough (or at least, I can't work out how to specify the lifetime for the new node such that it does live at least as long as the reference to it that'd be returned from the function).
fn put_after_node(maybe_node: &mut Option<Box<Node>>, val: i32) -> &Box<Node> {
match maybe_node {
// compile error on next line: borrowed value does not live long enough
&mut None => &Box::new(Node { next: None, val: val }),
&mut Some(ref mut node) => {
// compile error on next line: cannot move out of borrowed content
node.next = Some(*put_after_node(&mut node.next, val));
node
}
}
}
(* The original snippet is a simplified version of a Rust transliteration that I'm attempting to do of this red black tree implementation's put(). I realise that the minimal example I've outlined here would be better as a loop, but that isn't the case for the code I'm actually trying to write.)
Update: I don't think this is a dup of `cannot move out of dereference of `&mut`-pointer` while building a sorted linked list because a) I'm trying to deal with a different error message & b) my fn takes self - not &mut self. Having said that, I will probably try to rewrite it to take &mut self, so thanks for the pointer #shepmaster.
Take the Option's value using take() (which itself uses mem::replace() under the covers):
fn put_after_node(maybe_node: Option<Box<Node>>, val: i32) -> Box<Node> {
match maybe_node {
None => Box::new(Node { next: None, val: val }),
Some(mut node) => {
// note the node.next.take()
node.next = Some(put_after_node(node.next.take(), val));
node
}
}
}

Resources