Getting pointer by &str - pointers

Consider this pseudocode:
let k = 10;
let ptr = &k as *const k;
println!("{:p}", ptr); // prints address of pointer
let addr = format!("{:p}", ptr);
super-unsafe {
// this would obviously be super unsafe. It may even cause a STATUS_ACCESS_VIOLATION if you try getting memory from a page that the OS didn't allocate to the program!
let ptr_gen = PointerFactory::from_str(addr.as_str());
assert_eq!(k, *ptr_gen);
}
The pseudocode gets the idea across: I want to be able to get a pointer to a certain memory address by its &str representation. Is this... possible?

So essentially what you want to do is parse the string back to an integer (usize) and then interpret that value as a pointer/reference†:
fn main()
{
let i = 12i32;
let r = format!("{:p}", &i);
let x = unsafe
{
let r = r.trim_start_matches("0x");
&*(usize::from_str_radix(&r, 16).unwrap() as *const i32)
};
println!("{}", x);
}
You can try this yourself in the playground.
†As you can see, you don't even need to cast your reference into a raw pointer, the {:p} formatter takes care of representing it as a memory location (index).
Update: As E_net4 mentioned this in the comment section, it is better to use usize here, which is architecture defined unlike the machine sized one. The transmute was not necessary, so I removed it. The third point about undefined behaviour however seems obvious to whomever tries to do something like the above. This answer provides a way to achieve what the OP asked for which doesn't mean this should be used for anything else than academic/experimental purposes :)

Related

How to traverse character elements of *const char pointer in Rust?

I'm new to Rust programing and I have a bit of difficulty when this language is different from C Example, I have a C function as follows:
bool check(char* data, int size){
int i;
for(i = 0; i < size; i++){
if( data[i] != 0x00){
return false;
}
}
return true;
}
How can I convert this function to Rust? I tried it like C, but it has Errors :((
First off, I assume that you want to use as little unsafe code as possible. Otherwise there really isn't any reason to use Rust in the first place, as you forfeit all the advantages it brings you.
Depending on what data represents, there are multiple ways to transfer this to Rust.
First off: Using pointer and length as two separate arguments is not possible in Rust without unsafe. It has the same concept, though; it's called slices. A slice is exactly the same as a pointer-size combination, just that the compiler understands it and checks it for correctness at compile time.
That said, a char* in C could actually be one of four things. Each of those things map to different types in Rust:
Binary data whose deallocation is taken care of somewhere else (in Rust terms: borrowed data)
maps to &[u8], a slice. The actual content of the slice is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
Binary data that has to be deallocated within this function after using it (in Rust terms: owned data)
maps to Vec<u8>; as soon as it goes out of scope the data is deleted
actual content is:
the address of the data as *u8 (hidden from the user)
the length of the data as usize
the size of the allocation as usize. This allows for efficient push()/pop() operations. It is guaranteed that the length of the data does not exceed the size of the allocation.
A string whose deallocation is taken care of somewhere else (in Rust terms: a borrowed string)
maps to &str, a so called string slice.
This is identical to &[u8] with the additional compile time guarantee that it contains valid UTF-8 data.
A string that has to be deallocated within this function after using it (in Rust terms: an owned string)
maps to String
same as Vec<u8> with the additional compile time guarantee that it contains valid UTF-8 data.
You can create &[u8] references from Vec<u8>'s and &str references from Strings.
Now this is the point where I have to make an assumption. Because the function that you posted checks if all of the elements of data are zero, and returns false if if finds a non-zero element, I assume the content of data is binary data. And because your function does not contain a free call, I assume it is borrowed data.
With that knowledge, this is how the given function would translate to Rust:
fn check(data: &[u8]) -> bool {
for d in data {
if *d != 0x00 {
return false;
}
}
true
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
This is quite a direct translation; it's not really idiomatic to use for loops a lot in Rust. Good Rust code is mostly iterator based; iterators are most of the time zero-cost abstraction that can get compiled very efficiently.
This is how your code would look like if rewritten based on iterators:
fn check(data: &[u8]) -> bool {
data.iter().all(|el| *el == 0x00)
}
fn main() {
let x = vec![0, 0, 0];
println!("Check {:?}: {}", x, check(&x));
let y = vec![0, 1, 0];
println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false
The reason this is more idiomatic is that it's a lot easier to read for someone who hasn't written it. It clearly says "return true if all elements are equal to zero". The for based code needs a second to think about to understand if its "all elements are zero", "any element is zero", "all elements are non-zero" or "any element is non-zero".
Note that both versions compile to the exact same bytecode.
Also note that, unlike the C version, the Rust borrow checker guarantees at compile time that data is valid. It's impossible in Rust (without unsafe) to produce a double free, a use-after-free, an out-of-bounds array access or any other kind of undefined behaviour that would cause memory corruption.
This is also the reason why Rust doesn't do pointers without unsafe - it needs the length of the data to check out-of-bounds errors at runtime. That means, accessing data via [] operator is a little more costly in Rust (as it does perform an out-of-bounds check every time), which is the reason why iterator based programming is a thing. Iterators can iterate over data a lot more efficient than directly accessing it via [] operators.

If I return a Vec buffer and a pointer to its internal data, is the pointer valid?

I'm writing some C FFI bindings, and I came up with a situation which I'm unsure whether it works or not. In its simplest form, it would be:
unsafe fn foo() -> (*const u8, Vec<u8>) {
let buf = vec![0, 1, 2];
(buf.as_ptr(), buf)
}
Now using it:
fn main() {
let (ptr, _buf) = foo();
// pass ptr to C function...
}
In the example above, is ptr valid, since _buf lives until the end of the scope?
The question is whether moving the Vec invalidates the pointer into it. And the answer is, it's not decided yet.
This is UCG issue #326.
So it is best to avoid code like that until it is decided. But for what it's worth, as a lot of code relies on that to work, I don't believe it will be decided to be invalid.

Why must pointers used by `offset_from` be derived from a pointer to the same object?

From the standard library:
Both pointers must be derived from a pointer to the same object. (See below for an example.)
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
// Make ptr2_other an "alias" of ptr2, but derived from ptr1.
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
// Since ptr2_other and ptr2 are derived from pointers to different
// objects, computing their offset is undefined behavior, even though
// they point to the same address!
unsafe {
let zero = ptr2_other.offset_from(ptr2); // Undefined Behavior
}
I do not understand why this must be the case.
This has to do with a concept called "provenance" meaning "the place of origin". The Rust Unsafe Code Guidelines has a section on Pointer Provenance. Its a pretty abstract rule but it explains that its an extra bit of information that is used during compilation that helps guide what pointer transformations are well defined.
// Let's assume the two allocations here have base addresses 0x100 and 0x200.
// We write pointer provenance as `#N` where `N` is some kind of ID uniquely
// identifying the allocation.
let raw1 = Box::into_raw(Box::new(13u8));
let raw2 = Box::into_raw(Box::new(42u8));
let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize);
// These pointers now have the following values:
// raw1 points to address 0x100 and has provenance #1.
// raw2 points to address 0x200 and has provenance #2.
// raw2_wrong points to address 0x200 and has provenance #1.
// In other words, raw2 and raw2_wrong have same *address*...
assert_eq!(raw2 as usize, raw2_wrong as usize);
// ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*:
// it points to address 0x200, which is in allocation #2, but the pointer
// has provenance #1.
The guidelines link to a good article: Pointers Are Complicated and its follow up Pointers Are Complicated II that go into more detail and coined the phrase:
Just because two pointers point to the same address, does not mean they are equal and can be used interchangeably.
Essentially, it is invalid to read a value via a pointer that is outside that pointer's original "allocation" even if you can guarantee a valid object exists there. Allowing such behavior could wreak havoc on the language's aliasing rules and possible optimizations. And there's pretty much never a good reason to do it.
This concept is mostly inherited from C and C++.
If you're curious if you've written code that violates this rule. Running it through miri, the undefined behavior analysis tool, can often find it.
fn main() {
let ptr1 = Box::into_raw(Box::new(0u8));
let ptr2 = Box::into_raw(Box::new(1u8));
let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
let ptr2_other = (ptr1 as *mut u8).wrapping_offset(diff);
assert_eq!(ptr2 as usize, ptr2_other as usize);
unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
}
error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
--> src/main.rs:7:49
|
7 | unsafe { println!("{} {} {}", *ptr1, *ptr2, *ptr2_other) };
| ^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 1200, but is outside bounds of alloc1444 which has size 1
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

What use case does pointers to pointer (eg **int) have?

This is pointers to pointers
package main
import "fmt"
func main() {
var num int
fmt.Println(&num) // 0x...0
makePointer(&num)
}
func makePointer(firstPointer *int) {
fmt.Println(firstPointer) // 0x...0
fmt.Println(&firstPointer) // 0x...1
makePointerToAPointer(&firstPointer)
}
func makePointerToAPointer(secondPointer **int) {
fmt.Println(secondPointer) // 0x...1
fmt.Println(&secondPointer) // 0x...2
}
When would you actually use this? You can properly come up with something where it would be easier to do something else, but that is not what I asking about. I really want to know where in production you would use this?
Pointers to pointers make sense in function parameters sometimes; not **int probably, but a pointer to a pointer to some struct, where you want the function to be able to change what object a variable points to, not just to change the contents of the struct. For example, there are a few functions in the internals of the Go compiler that take a **Node (see cmd/compile/internal/gc/racewalk.go).
I've also written a couple of functions myself that take a **html.Node; they operate on an HTML page that may or may not have already been parsed into a tree of *html.Nodes, and they may or may not need to parse the page—but if they do, I want to keep the parsed tree around so that I don't have to parse it again. These are in github.com/andybalholm/redwood/prune.go.
They are much more common in languages that do not have multiple return values, since they can be used as a way to return an additional value that is a pointer. Many Objective-C methods take an NSError** as their last parameter so that they can optionally return an NSError*.
The goal to pass a pointer to something is if there is need to modify the pointed value. (We also use pointers to avoid copying large data structures when passing, but that is just for optimization.)
Like in this example:
func main() {
var i int
fmt.Println(i)
inc(&i)
fmt.Println(i)
}
func inc(i *int) {
*i++
}
Output is the expected (try it on the Go Playground):
0
1
If parameter of inc() would receive an int only, it could only modify the copy and not the original value, and so the caller would not observe the changed value.
Same goes with pointer to pointer to something. We use pointer to pointer to something, if we need to modify the pointed value, that is the pointed pointer. Like in this example:
func main() {
var i *int
fmt.Println(i)
alloc(&i, 1)
fmt.Println(i, *i)
setToNil(&i)
fmt.Println(i)
}
func alloc(i **int, initial int) {
*i = new(int)
**i = initial
}
func setToNil(i **int) {
*i = nil
}
Output (try it on the Go Playground):
<nil>
0x1040a130 1
<nil>
The reason why pointer to pointer is not really used is because modifying a pointed value can be substituted by returning the value, and assigning it at the caller:
func main() {
var i *int
fmt.Println(i)
i = alloc(1)
fmt.Println(i, *i)
i = setToNil()
fmt.Println(i)
}
func alloc(initial int) *int {
i := new(int)
*i = initial
return i
}
func setToNil() *int {
return nil // Nothing to do here, assignment happens at the caller!
}
Output is the same (address might be different) (try it on the Go Playground):
<nil>
0x1040a130 1
<nil>
This variant is easier to read and maintain, so this is clearly the favored and wide-spread alternative to functions having to modify a pointer value.
In languages where functions and methods can only have 1 return value, it usually requires additional "work" if the function also wants to return other values besides the pointer, e.g. a wrapper is to be created to accommodate the multiple return values. But since Go supports multiple return values, need for pointer to pointer basically drops to zero as it can be substituted with returning the pointer that would be set to the pointed pointer; and it does not require additional work and does not make code less readable.
This is a very similar case to the builtin append() function: it appends values to a slice. And since the slice value changes (its length increases, also the pointer in it may also change if a new backing array needs to be allocated), append() returns the new slice value which you need to assign (if you want to keep the new slice).
See this related question where a pointer to pointer is proposed (but also returning a pointer is also viable / preferred): Golang: Can the pointer in a struct pointer method be reassigned to another instance?
In the same way a pointer to a value lets you have many references to the same value for a consistent view of the value when it changes, a pointer to a pointer lets you have many references to the same reference for a consistent view of the pointer when it changes to point to a different location in memory.
I can't say I've ever seen it used in practice in Go that I can think of.

How to allocate space for a Vec<T> in Rust?

I want to create a Vec<T> and make some room for it, but I don't know how to do it, and, to my surprise, there is almost nothing in the official documentation about this basic type.
let mut v: Vec<i32> = Vec<i32>(SIZE); // How do I do this ?
for i in 0..SIZE {
v[i] = i;
}
I know I can create an empty Vec<T> and fill it with pushes, but I don't want to do that since I don't always know, when writing a value at index i, if a value was already inserted there yet. I don't want to write, for obvious performance reasons, something like :
if i >= len(v) {
v.push(x);
} else {
v[i] = x;
}
And, of course, I can't use the vec! syntax either.
While vec![elem; count] from the accepted answer is sufficient to create a vector with all elements equal to the same value, there are other convenience functions.
Vec::with_capacity() creates a vector with the given capacity but with zero length. It means that until this capacity is reached, push() calls won't reallocate the vector, making push() essentially free:
fn main() {
let mut v = Vec::with_capacity(10);
for i in 0..10 {
v.push(i);
}
println!("{:?}", v);
}
You can also easily collect() a vector from an iterator. Example:
fn main() {
let v: Vec<_> = (1..10).collect();
println!("{:?}", v);
}
And finally, sometimes your vector contains values of primitive type and is supposed to be used as a buffer (e.g. in network communication). In this case you can use Vec::with_capacity() + set_len() unsafe method:
fn main() {
let mut v = Vec::with_capacity(10);
unsafe { v.set_len(10); }
for i in 0..10 {
v[i] = i;
}
println!("{:?}", v);
}
Note that you have to be extra careful if your vector contains values with destructors or references - it's easy to get a destructor run over a uninitialized piece of memory or to get an invalid reference this way. It will also work right if you only use initialized part of the vector (you have to track it yourself now). To read about all the possible dangers of uninitialized memory, you can read the documentation of mem::uninitialized().
You can use the first syntax of the vec! macro, specifically vec![elem; count]. For example:
vec![1; 10]
will create a Vec<_> containing 10 1s (the type _ will be determined later or default to i32). The elem given to the macro must implement Clone. The count can be a variable, too.
There is the Vec::resize method:
fn resize(&mut self, new_len: usize, value: T)
This code resizes an empty vector to 1024 elements by filling with the value 7:
let mut vec: Vec<i32> = Vec::new();
vec.resize(1024, 7);

Resources