How does converting the primitive str through multiple "as" casts work? - pointers

Here is what I found in Rust's source code. I have difficulty in understanding &mut *(self as *mut str as *mut [u8]) and self as *const str as *const u8.
Is it a two-step conversion? First convert to a *mut str or *const str, next as a *mut [u8] or *const u8?
#[stable(feature = "str_mut_extras", since = "1.20.0")]
#[inline(always)]
pub unsafe fn as_bytes_mut(&mut self) -> &mut [u8] {
&mut *(self as *mut str as *mut [u8])
}
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub const fn as_ptr(&self) -> *const u8 {
self as *const str as *const u8
}

In Rust, the as operator allows converting by one step at a time.
There are a few conversions allowed, such as:
&T to *const T,
&mut T to *mut T,
*mut T to *mut U (pending some conditions on T and U),
...
However, even though you can go &mut T to *mut T to *mut U using as twice, you cannot go directly from &mut T to *mut U; both because compiler and humans would have a hard time figuring out the intermediate steps.
So, what's this conversion sequence about?
Going from reference to pointer: typical &T to *const T, or the mut variant.
Going from pointer to str to pointer to [u8]: a typical *const T to *const U for adequates T and U. str actually has the same representation as [u8], but only a subset of values are valid (proper UTF-8 ones).
It's interesting to note that one is safe and not the other:
Since all str are [u8], converting from *str to *[u8] is always safe.
However, exposing &mut [u8] allows breaking invariants inside str, and therefore as_bytes_mut is unsafe.

Related

What do the '&&' and star '**' symbols mean in Rust?

fn main() {
let c: i32 = 5;
let rrc = &&c;
println!("{}", rrc); // 5
println!("{}", *rrc); // 5
println!("{}", **rrc); // 5
}
In C/C++ language, rrc likes a two level pointer. In this example, rrc doesn't mean this in rust. What do & and * mean in Rust?
The reason they all print the same thing is that borrows (&T) in Rust implements Display for T: Display by dispatching it to the pointed type. If you want to print the actual pointer value, you have to use {:p}.
let c = 5;
let rrc = &&c;
println!("{:p}", rrc); // 0x7ffc4e4c7590
println!("{:p}", *rrc); // 0x7ffc4e4c7584
println!("{}", **rrc); // 5
See the playground.
It is simply two reference operators. The reason why they all print 5 is because when printing a reference it automatically de-references it:
fn main() {
let c: i32 = 5;
let rrc = &c;
let rrc = &rrc; // this is &&c
}

Why does derefrencing a raw pointer gives a segmentation fault if there is no shared reference to the value pointed at?

Running the following code gives a segmentation fault:
fn main() {
let val = 1;
let ptr = val as *const i32;
unsafe { println!("{:?}", *ptr) };
}
Output:
[1] 69212 segmentation fault (core dumped) cargo r
However, when val is put in as a reference & while declaring the raw pointer, the code runs as intended and as val is printed out.
fn main() {
let val = 1;
let ptr = &val as *const i32;
unsafe { println!("{:?}", *ptr) };
}
Output:
1
So what is the shared reference doing here and why does the program fail without it? Isn't a reference in rust also a pointer with extra schematics? Why to we need to create a pointer to a reference and not directly to the val itself?
This issue can be answered by looking at the different semantics of the both code lines you provided.
fn main() {
let val = 1;
println!("{:?}", val as *const i32); // Output: 0x1
println!("{:?}", &val as *const i32); // Output: 0x7ff7b36a4eec (probably little different)
}
Without the reference the value of the variable is take as it is to be used to dereference the memory. This leads of course to a segmentation fault, since it will be not in the allowed address range of the program.
Only when the reference operator is used, the address of the variable is casted to a raw pointer, which then later can be dereferenced without any segmentation fault.

how to read arguments and return from a dll function in R

I'm trying to load a dll into my R script. Dll is written in rust. I read in R Studio documentation that .Call passes integers as int * in C which i interpret as &i32 in rust (also assuming that mutability is just rust thing, and i don't have to make it &mut i32 if i don't intent to mutate it). However R kept on crashing the session, so i start doing the trial and error. Made this file and tried to load it (the base taken from this repo):
#![cfg(windows)]
use winapi::shared::minwindef;
use winapi::shared::minwindef::{BOOL, DWORD, HINSTANCE, LPVOID};
use winapi::um::consoleapi;
/// Entry point which will be called by the system once the DLL has been loaded
/// in the target process. Declaring this function is optional.
///
/// # Safety
///
/// What you can safely do inside here is very limited, see the Microsoft documentation
/// about "DllMain". Rust also doesn't officially support a "life before main()",
/// though it is unclear what that that means exactly for DllMain.
#[no_mangle]
#[allow(non_snake_case, unused_variables)]
extern "system" fn DllMain(
dll_module: HINSTANCE,
call_reason: DWORD,
reserved: LPVOID)
-> BOOL
{
const DLL_PROCESS_ATTACH: DWORD = 1;
const DLL_PROCESS_DETACH: DWORD = 0;
match call_reason {
DLL_PROCESS_ATTACH => demo_init(),
DLL_PROCESS_DETACH => (),
_ => ()
}
minwindef::TRUE
}
fn demo_init() {
unsafe { consoleapi::AllocConsole() };
println!("Hello, world!");
}
#[no_mangle]
extern "cdecl" fn seven_cdecl_u32() -> u32 {
7
}
#[no_mangle]
extern "cdecl" fn seven_cdecl_u64() -> u64 {
7
}
#[no_mangle]
extern "cdecl" fn seven_cdecl_i32() -> i32 {
7
}
#[no_mangle]
extern "cdecl" fn seven_cdecl_i64() -> i64 {
7
}
#[no_mangle]
extern "stdcall" fn seven_stdcall_u32() -> u32 {
7
}
#[no_mangle]
extern "stdcall" fn seven_stdcall_u64() -> u64 {
7
}
#[no_mangle]
extern "stdcall" fn seven_stdcall_i32() -> i32 {
7
}
#[no_mangle]
extern "stdcall" fn seven_stdcall_i64() -> i64 {
7
}
#[no_mangle]
extern "system" fn seven_system_u32() -> u32 {
7
}
#[no_mangle]
extern "system" fn seven_system_i32() -> i32 {
7
}
#[no_mangle]
extern "system" fn seven_system_u64() -> u64 {
7
}
#[no_mangle]
extern "system" fn seven_system_i64() -> i64 {
7
}
#[no_mangle]
extern "C" fn seven_c_u32() -> u32 {
7
}
#[no_mangle]
extern "C" fn seven_c_i32() -> i32 {
7
}
#[no_mangle]
extern "C" fn seven_c_u64() -> u64 {
7
}
#[no_mangle]
extern "C" fn seven_c_i64() -> i64 {
7
}
CWD = r"(C:\\Users\grass\Desktop\codes\R\dlload)"
dllname = paste(CWD,r"(\rdll.dll)", sep="")
print(getwd())
dyn.load(dllname)
#print(.Call("seven_cdecl_i32", pakage=dllname))
#print(.Call("seven_cdecl_u32", pakage=dllname))
#print(.Call("seven_cdecl_i64", pakage=dllname))
#print(.Call("seven_cdecl_u64", pakage=dllname))
#print(.Call("seven_stdcall_i32", pakage=dllname))
#print(.Call("seven_stdcall_u32", pakage=dllname))
#print(.Call("seven_stdcall_i64", pakage=dllname))
#print(.Call("seven_stdcall_u64", pakage=dllname))
#print(.Call("seven_system_i32", pakage=dllname))
#print(.Call("seven_system_u32", pakage=dllname))
#print(.Call("seven_system_i64", pakage=dllname))
#print(.Call("seven_system_u64", pakage=dllname))
#print(.Call("seven_c_i32", pakage=dllname))
#print(.Call("seven_c_u32", pakage=dllname))
#print(.Call("seven_c_i64", pakage=dllname))
#print(.Call("seven_c_u64", pakage=dllname))
I was commenting out line by line but it never worked. But the entry point did work, and the hello world was printed. When i try to print a value of integer i pass to function (7) i get some absolute garbage, which made me think that memory layout is different. I read that all values in R are vectors which changes the layout, but i assumed that .Call is designed with this in mind.
Finally the documentation in R Studio claims that for R unaware functions .C should be used, but i don't understand how to get return value from .C as it evaluates to a list of parameters and a package name.
If anyone can tell me how to properly get arguments in rust from R and return from rust to R I would be grateful.
So from the Rodrigo's comment I looked if i could mutate value passed instead of returning it. It seems that there are limited capabilities to pass opaque pointers, hence using this way to return structs is impossible. But I managed to take and mutate a string value, which is shown here:
use std::iter::{once, zip};
struct RString {
base_ptr: *mut u8,
len: usize,
}
impl From<*mut *mut u8> for RString {
fn from(base_ptr: *mut *mut u8) -> Self {
Self {
base_ptr: unsafe { base_ptr.read() },
len: {
let mut off: isize = 0;
while '\0' as u8 != unsafe { base_ptr.read().offset(off).read() } {
off += 1;
}
off as usize
}
}
}
}
impl RString {
pub fn value(&self) -> String {
let mut buff: String = String::new();
for off in 0..(self.len as isize) {
buff.push(unsafe { self.base_ptr.clone().offset(off).read() } as char);
}
buff
}
pub fn edit(&mut self, new_value: String) {
self.len = new_value.len();
for (off, val) in (0..(self.len as isize)).zip(new_value.chars().chain(once(0_u8 as char))) {
unsafe {self.base_ptr.clone().offset(off).write(val as u8)};
}
}
}
/// takes a single string argument <a>, returns "Hello <a>!"
#[no_mangle]
extern "system" fn meet_n_greet(nameptr: *mut *mut u8) {
let mut rs: RString = RString::from(nameptr);
println!("Hello {}!", rs.value());
rs.edit(format!("Hello {}!", rs.value()));
}
CWD = r"(C:\\Users\grass\Desktop\codes\R\dlload)"
dllname = paste(CWD,r"(\rdll.dll)", sep="")
dyn.load(dllname)
print(.C("meet_n_greet", "Leroy Jenkins",package=dllname)[1])
dyn.unload(dllname)
The rust code is ugly and unsafe, but the example does work. This answer also does not solve the issue of opaque data so I'm just posting it to help people on their way.

Why don't you need to dereference when calling a method, is it syntactic sugar for something else? [duplicate]

I'm learning/experimenting with Rust, and in all the elegance that I find in this language, there is one peculiarity that baffles me and seems totally out of place.
Rust automatically dereferences pointers when making method calls. I made some tests to determine the exact behaviour:
struct X { val: i32 }
impl std::ops::Deref for X {
type Target = i32;
fn deref(&self) -> &i32 { &self.val }
}
trait M { fn m(self); }
impl M for i32 { fn m(self) { println!("i32::m()"); } }
impl M for X { fn m(self) { println!("X::m()"); } }
impl M for &X { fn m(self) { println!("&X::m()"); } }
impl M for &&X { fn m(self) { println!("&&X::m()"); } }
impl M for &&&X { fn m(self) { println!("&&&X::m()"); } }
trait RefM { fn refm(&self); }
impl RefM for i32 { fn refm(&self) { println!("i32::refm()"); } }
impl RefM for X { fn refm(&self) { println!("X::refm()"); } }
impl RefM for &X { fn refm(&self) { println!("&X::refm()"); } }
impl RefM for &&X { fn refm(&self) { println!("&&X::refm()"); } }
impl RefM for &&&X { fn refm(&self) { println!("&&&X::refm()"); } }
struct Y { val: i32 }
impl std::ops::Deref for Y {
type Target = i32;
fn deref(&self) -> &i32 { &self.val }
}
struct Z { val: Y }
impl std::ops::Deref for Z {
type Target = Y;
fn deref(&self) -> &Y { &self.val }
}
#[derive(Clone, Copy)]
struct A;
impl M for A { fn m(self) { println!("A::m()"); } }
impl M for &&&A { fn m(self) { println!("&&&A::m()"); } }
impl RefM for A { fn refm(&self) { println!("A::refm()"); } }
impl RefM for &&&A { fn refm(&self) { println!("&&&A::refm()"); } }
fn main() {
// I'll use # to denote left side of the dot operator
(*X{val:42}).m(); // i32::m() , Self == #
X{val:42}.m(); // X::m() , Self == #
(&X{val:42}).m(); // &X::m() , Self == #
(&&X{val:42}).m(); // &&X::m() , Self == #
(&&&X{val:42}).m(); // &&&X:m() , Self == #
(&&&&X{val:42}).m(); // &&&X::m() , Self == *#
(&&&&&X{val:42}).m(); // &&&X::m() , Self == **#
println!("-------------------------");
(*X{val:42}).refm(); // i32::refm() , Self == #
X{val:42}.refm(); // X::refm() , Self == #
(&X{val:42}).refm(); // X::refm() , Self == *#
(&&X{val:42}).refm(); // &X::refm() , Self == *#
(&&&X{val:42}).refm(); // &&X::refm() , Self == *#
(&&&&X{val:42}).refm(); // &&&X::refm(), Self == *#
(&&&&&X{val:42}).refm(); // &&&X::refm(), Self == **#
println!("-------------------------");
Y{val:42}.refm(); // i32::refm() , Self == *#
Z{val:Y{val:42}}.refm(); // i32::refm() , Self == **#
println!("-------------------------");
A.m(); // A::m() , Self == #
// without the Copy trait, (&A).m() would be a compilation error:
// cannot move out of borrowed content
(&A).m(); // A::m() , Self == *#
(&&A).m(); // &&&A::m() , Self == &#
(&&&A).m(); // &&&A::m() , Self == #
A.refm(); // A::refm() , Self == #
(&A).refm(); // A::refm() , Self == *#
(&&A).refm(); // A::refm() , Self == **#
(&&&A).refm(); // &&&A::refm(), Self == #
}
(Playground)
So, it seems that, more or less:
The compiler will insert as many dereference operators as necessary to invoke a method.
The compiler, when resolving methods declared using &self (call-by-reference):
First tries calling for a single dereference of self
Then tries calling for the exact type of self
Then, tries inserting as many dereference operators as necessary for a match
Methods declared using self (call-by-value) for type T behave as if they were declared using &self (call-by-reference) for type &T and called on the reference to whatever is on the left side of the dot operator.
The above rules are first tried with raw built-in dereferencing, and if there's no match, the overload with Deref trait is used.
What are the exact auto-dereferencing rules? Can anyone give any formal rationale for such a design decision?
Your pseudo-code is pretty much correct. For this example, suppose we had a method call foo.bar() where foo: T. I'm going to use the fully qualified syntax (FQS) to be unambiguous about what type the method is being called with, e.g. A::bar(foo) or A::bar(&***foo). I'm just going to write a pile of random capital letters, each one is just some arbitrary type/trait, except T is always the type of the original variable foo that the method is called on.
The core of the algorithm is:
For each "dereference step" U (that is, set U = T and then U = *T, ...)
if there's a method bar where the receiver type (the type of self in the method) matches U exactly , use it (a "by value method")
otherwise, add one auto-ref (take & or &mut of the receiver), and, if some method's receiver matches &U, use it (an "autorefd method")
Notably, everything considers the "receiver type" of the method, not the Self type of the trait, i.e. impl ... for Foo { fn method(&self) {} } thinks about &Foo when matching the method, and fn method2(&mut self) would think about &mut Foo when matching.
It is an error if there's ever multiple trait methods valid in the inner steps (that is, there can be only be zero or one trait methods valid in each of 1. or 2., but there can be one valid for each: the one from 1 will be taken first), and inherent methods take precedence over trait ones. It's also an error if we get to the end of the loop without finding anything that matches. It is also an error to have recursive Deref implementations, which make the loop infinite (they'll hit the "recursion limit").
These rules seem to do-what-I-mean in most circumstances, although having the ability to write the unambiguous FQS form is very useful in some edge cases, and for sensible error messages for macro-generated code.
Only one auto-reference is added because
if there was no bound, things get bad/slow, since every type can have an arbitrary number of references taken
taking one reference &foo retains a strong connection to foo (it is the address of foo itself), but taking more starts to lose it: &&foo is the address of some temporary variable on the stack that stores &foo.
Examples
Suppose we have a call foo.refm(), if foo has type:
X, then we start with U = X, refm has receiver type &..., so step 1 doesn't match, taking an auto-ref gives us &X, and this does match (with Self = X), so the call is RefM::refm(&foo)
&X, starts with U = &X, which matches &self in the first step (with Self = X), and so the call is RefM::refm(foo)
&&&&&X, this doesn't match either step (the trait isn't implemented for &&&&X or &&&&&X), so we dereference once to get U = &&&&X, which matches 1 (with Self = &&&X) and the call is RefM::refm(*foo)
Z, doesn't match either step so it is dereferenced once, to get Y, which also doesn't match, so it's dereferenced again, to get X, which doesn't match 1, but does match after autorefing, so the call is RefM::refm(&**foo).
&&A, the 1. doesn't match and neither does 2. since the trait is not implemented for &A (for 1) or &&A (for 2), so it is dereferenced to &A, which matches 1., with Self = A
Suppose we have foo.m(), and that A isn't Copy, if foo has type:
A, then U = A matches self directly so the call is M::m(foo) with Self = A
&A, then 1. doesn't match, and neither does 2. (neither &A nor &&A implement the trait), so it is dereferenced to A, which does match, but M::m(*foo) requires taking A by value and hence moving out of foo, hence the error.
&&A, 1. doesn't match, but autorefing gives &&&A, which does match, so the call is M::m(&foo) with Self = &&&A.
(This answer is based on the code, and is reasonably close to the (slightly outdated) README. Niko Matsakis, the main author of this part of the compiler/language, also glanced over this answer.)
The Rust reference has a chapter about the method call expression. I copied the most important part below. Reminder: we are talking about an expression recv.m(), where recv is called "receiver expression" below.
The first step is to build a list of candidate receiver types. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate T, add &T and &mut T to the list immediately after T.
For instance, if the receiver has type Box<[i32;2]>, then the candidate types will be Box<[i32;2]>, &Box<[i32;2]>, &mut Box<[i32;2]>, [i32; 2] (by dereferencing), &[i32; 2], &mut [i32; 2], [i32] (by unsized coercion), &[i32], and finally &mut [i32].
Then, for each candidate type T, search for a visible method with a receiver of that type in the following places:
T's inherent methods (methods implemented directly on T [¹]).
Any of the methods provided by a visible trait implemented by T. [...]
(Note about [¹]: I actually think this phrasing is wrong. I've opened an issue. Let's just ignore that sentence in the parenthesis.)
Let's go through a few examples from your code in detail! For your examples, we can ignore the part about "unsized coercion" and "inherent methods".
(*X{val:42}).m(): the receiver expression's type is i32. We perform these steps:
Creating list of candidate receiver types:
i32 cannot be dereferenced, so we are already done with step 1. List: [i32]
Next, we add &i32 and &mut i32. List: [i32, &i32, &mut i32]
Searching for methods for each candidate receiver type:
We find <i32 as M>::m which has the receiver type i32. So we are already done.
So far so easy. Now let's pick a more difficult example: (&&A).m(). The receiver expression's type is &&A. We perform these steps:
Creating list of candidate receiver types:
&&A can be dereferenced to &A, so we add that to the list. &A can be dereferenced again, so we also add A to the list. A cannot be dereferenced, so we stop. List: [&&A, &A, A]
Next, for each type T in the list, we add &T and &mut T immediately after T. List: [&&A, &&&A, &mut &&A, &A, &&A, &mut &A, A, &A, &mut A]
Searching for methods for each candidate receiver type:
There is no method with receiver type &&A, so we go to the next type in the list.
We find the method <&&&A as M>::m which indeed has the receiver type &&&A. So we are done.
Here are the candidate receiver lists for all of your examples. The type that is enclosed in ⟪x⟫ is the one that "won", i.e. the first type for which a fitting method could be found. Also remember that the first type in the list is always the receiver expression's type. Lastly, I formatted the list in lines of three, but that's just formatting: this list is a flat list.
(*X{val:42}).m() → <i32 as M>::m
[⟪i32⟫, &i32, &mut i32]
X{val:42}.m() → <X as M>::m
[⟪X⟫, &X, &mut X,
i32, &i32, &mut i32]
(&X{val:42}).m() → <&X as M>::m
[⟪&X⟫, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&X{val:42}).m() → <&&X as M>::m
[⟪&&X⟫, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&X{val:42}).m() → <&&&X as M>::m
[⟪&&&X⟫, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&&X{val:42}).m() → <&&&X as M>::m
[&&&&X, &&&&&X, &mut &&&&X,
⟪&&&X⟫, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&&&X{val:42}).m() → <&&&X as M>::m
[&&&&&X, &&&&&&X, &mut &&&&&X,
&&&&X, &&&&&X, &mut &&&&X,
⟪&&&X⟫, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(*X{val:42}).refm() → <i32 as RefM>::refm
[i32, ⟪&i32⟫, &mut i32]
X{val:42}.refm() → <X as RefM>::refm
[X, ⟪&X⟫, &mut X,
i32, &i32, &mut i32]
(&X{val:42}).refm() → <X as RefM>::refm
[⟪&X⟫, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&X{val:42}).refm() → <&X as RefM>::refm
[⟪&&X⟫, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&X{val:42}).refm() → <&&X as RefM>::refm
[⟪&&&X⟫, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&&X{val:42}).refm() → <&&&X as RefM>::refm
[⟪&&&&X⟫, &&&&&X, &mut &&&&X,
&&&X, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
(&&&&&X{val:42}).refm() → <&&&X as RefM>::refm
[&&&&&X, &&&&&&X, &mut &&&&&X,
⟪&&&&X⟫, &&&&&X, &mut &&&&X,
&&&X, &&&&X, &mut &&&X,
&&X, &&&X, &mut &&X,
&X, &&X, &mut &X,
X, &X, &mut X,
i32, &i32, &mut i32]
Y{val:42}.refm() → <i32 as RefM>::refm
[Y, &Y, &mut Y,
i32, ⟪&i32⟫, &mut i32]
Z{val:Y{val:42}}.refm() → <i32 as RefM>::refm
[Z, &Z, &mut Z,
Y, &Y, &mut Y,
i32, ⟪&i32⟫, &mut i32]
A.m() → <A as M>::m
[⟪A⟫, &A, &mut A]
(&A).m() → <A as M>::m
[&A, &&A, &mut &A,
⟪A⟫, &A, &mut A]
(&&A).m() → <&&&A as M>::m
[&&A, ⟪&&&A⟫, &mut &&A,
&A, &&A, &mut &A,
A, &A, &mut A]
(&&&A).m() → <&&&A as M>::m
[⟪&&&A⟫, &&&&A, &mut &&&A,
&&A, &&&A, &mut &&A,
&A, &&A, &mut &A,
A, &A, &mut A]
A.refm() → <A as RefM>::refm
[A, ⟪&A⟫, &mut A]
(&A).refm() → <A as RefM>::refm
[⟪&A⟫, &&A, &mut &A,
A, &A, &mut A]
(&&A).refm() → <A as RefM>::refm
[&&A, &&&A, &mut &&A,
⟪&A⟫, &&A, &mut &A,
A, &A, &mut A]
(&&&A).refm() → <&&&A as RefM>::refm
[&&&A, ⟪&&&&A⟫, &mut &&&A,
&&A, &&&A, &mut &&A,
&A, &&A, &mut &A,
A, &A, &mut A]
I was troubled by this problem for a long time, especially for this part:
(*X{val:42}).refm(); // i32::refm() , Self == #
X{val:42}.refm(); // X::refm() , Self == #
(&X{val:42}).refm(); // X::refm() , Self == *#
(&&X{val:42}).refm(); // &X::refm() , Self == *#
(&&&X{val:42}).refm(); // &&X::refm() , Self == *#
(&&&&X{val:42}).refm(); // &&&X::refm(), Self == *#
(&&&&&X{val:42}).refm(); // &&&X::refm(), Self == **#
until I found a way to remember these weird rules. I'm not sure if this is correct, but most of the time this method is effective.
The key is, when looking for which function to use, do NOT use the type which calling the "dot operator" to determine which "impl" to use, but find the function according to the function signature, and then determine the type of "self" with the function signature.
I converte the function defination code as follows:
trait RefM { fn refm(&self); }
impl RefM for i32 { fn refm(&self) { println!("i32::refm()"); } }
// converted to: fn refm(&i32 ) { println!("i32::refm()"); }
// => type of 'self' : i32
// => type of parameter: &i32
impl RefM for X { fn refm(&self) { println!("X::refm()"); } }
// converted to: fn refm(&X ) { println!("X::refm()"); }
// => type of 'self' : X
// => type of parameter: &X
impl RefM for &X { fn refm(&self) { println!("&X::refm()"); } }
// converted to: fn refm(&&X ) { println!("&X::refm()"); }
// => type of 'self' : &X
// => type of parameter: &&X
impl RefM for &&X { fn refm(&self) { println!("&&X::refm()"); } }
// converted to: fn refm(&&&X ) { println!("&&X::refm()"); }
// => type of 'self' : &&X
// => type of parameter: &&&X
impl RefM for &&&X { fn refm(&self) { println!("&&&X::refm()"); } }
// converted to: fn refm(&&&&X) { println!("&&&X::refm()"); }
// => type of 'self' : &&&X
// => type of parameter: &&&&X
Therefore, when you write the code:
(&X{val:42}).refm();
the function
fn refm(&X ) { println!("X::refm()");
will be called, because the parameter type is &X.
And if no matching function signature is found, an auto-ref or some auto-deref performed.
Methods declared using self (call-by-value) for type T behave as if
they were declared using &self (call-by-reference) for type &T and
called on the reference to whatever is on the left side of the dot
operator.
They don't behave exactly the same. When using self, a move happens (unless the struct is Copy)
let example = X { val: 42};
example.m (); // is the same as M::m (example);
// Not possible: value used here after move
// example.m ();
let example = X { val: 42};
example.refm ();
example.refm ();

How to get a pointer to a containing struct from a pointer to a member?

I have a type:
struct Foo {
memberA: Bar,
memberB: Baz,
}
and a pointer which I know is a pointer to memberB in Foo:
p: *const Baz
What is the correct way to get a new pointer p: *const Foo which points to the original struct Foo?
My current implementation is the following, which I'm pretty sure invokes undefined behavior due to the dereference of (p as *const Foo) where p is not a pointer to a Foo:
let p2 = p as usize -
((&(*(p as *const Foo)).memberB as *const _ as usize) - (p as usize));
This is part of FFI - I can't easily restructure the code to avoid needing to perform this operation.
This is very similar to Get pointer to object from pointer to some member but for Rust, which as far as I know has no offsetof macro.
The dereference expression produces an lvalue, but that lvalue is not actually read from, we're just doing pointer math on it, so in theory, it should be well defined. That's just my interpretation though.
My solution involves using a null pointer to retrieve the offset to the field, so it's a bit simpler than yours as it avoids one subtraction (we'd be subtracting 0). I believe I saw some C compilers/standard libraries implementing offsetof by essentially returning the address of a field from a null pointer, which is what inspired the following solution.
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = unsafe { ((p as usize) - (&(*(0 as *const Foo)).memberB as *const _ as usize)) as *const _ };
println!("{:p}", p2);
}
We can also define our own offset_of! macro:
macro_rules! offset_of {
($ty:ty, $field:ident) => {
unsafe { &(*(0 as *const $ty)).$field as *const _ as usize }
}
}
fn main() {
let p: *const Baz = 0x1248 as *const _;
let p2: *const Foo = ((p as usize) - offset_of!(Foo, memberB)) as *const _;
println!("{:p}", p2);
}
With the implementation of RFC 2582, raw reference MIR operator, it is now possible to get the address of a field in a struct without an instance of the struct and without invoking undefined behavior.
use std::{mem::MaybeUninit, ptr};
struct Example {
a: i32,
b: u8,
c: bool,
}
fn main() {
let offset = unsafe {
let base = MaybeUninit::<Example>::uninit();
let base_ptr = base.as_ptr();
let c = ptr::addr_of!((*base_ptr).c);
(c as usize) - (base_ptr as usize)
};
println!("{}", offset);
}
The implementation of this is tricky and nuanced. It is best to use a crate that is well-maintained, such as memoffset.
Before this functionality was stabilized, you must have a valid instance of the struct. You can use tools like once_cell to minimize the overhead of the dummy value that you need to create:
use once_cell::sync::Lazy; // 1.4.1
struct Example {
a: i32,
b: u8,
c: bool,
}
static DUMMY: Lazy<Example> = Lazy::new(|| Example {
a: 0,
b: 0,
c: false,
});
static OFFSET_C: Lazy<usize> = Lazy::new(|| {
let base: *const Example = &*DUMMY;
let c: *const bool = &DUMMY.c;
(c as usize) - (base as usize)
});
fn main() {
println!("{}", *OFFSET_C);
}
If you must have this at compile time, you can place similar code into a build script and write out a Rust source file with the offsets. However, that will span multiple compiler invocations, so you are relying on the struct layout not changing between those invocations. Using something with a known representation would reduce that risk.
See also:
How do I create a global, mutable singleton?
How to create a static string at compile time

Resources