How does Rust implement reflection? - reflection

Rust has the Any trait, but it also has a "do not pay for what you do not use" policy. How does Rust implement reflection?
My guess is that Rust uses lazy tagging. Every type is initially unassigned, but later if an instance of the type is passed to a function expecting an Any trait, the type is assigned a TypeId.
Or maybe Rust puts a TypeId on every type that its instance is possibly passed to that function? I guess the former would be expensive.

First of all, Rust doesn't have reflection; reflection implies you can get details about a type at runtime, like the fields, methods, interfaces it implements, etc. You can not do this with Rust. The closest you can get is explicitly implementing (or deriving) a trait that provides this information.
Each type gets a TypeId assigned to it at compile time. Because having globally ordered IDs is hard, the ID is an integer derived from a combination of the type's definition, and assorted metadata about the crate in which it's contained. To put it another way: they're not assigned in any sort of order, they're just hashes of the various bits of information that go into defining the type. [1]
If you look at the source for the Any trait, you'll see the single implementation for Any:
impl<T: 'static + ?Sized > Any for T {
fn get_type_id(&self) -> TypeId { TypeId::of::<T>() }
}
(The bounds can be informally reduced to "all types that aren't borrowed from something else".)
You can also find the definition of TypeId:
pub struct TypeId {
t: u64,
}
impl TypeId {
pub const fn of<T: ?Sized + 'static>() -> TypeId {
TypeId {
t: unsafe { intrinsics::type_id::<T>() },
}
}
}
intrinsics::type_id is an internal function recognised by the compiler that, given a type, returns its internal type ID. This call just gets replaced at compile time with the literal integer type ID; there's no actual call here. [2] That's how TypeId knows what a type's ID is. TypeId, then, is just a wrapper around this u64 to hide the implementation details from users. If you find it conceptually simpler, you can just think of a type's TypeId as being a constant 64-bit integer that the compiler just knows at compile time.
Any forwards to this from get_type_id, meaning that get_type_id is really just binding the trait method to the appropriate TypeId::of method. It's just there to ensure that if you have an Any, you can find out the original type's TypeId.
Now, Any is implemented for most types, but this doesn't mean that all those types actually have an Any implementation floating around in memory. What actually happens is that the compiler only generates the actual code for a type's Any implementation if someone writes code that requires it. [3] In other words, if you never use the Any implementation for a given type, the compiler will never generate it.
This is how Rust fulfills "do not pay for what do you not use": if you never pass a given type as &Any or Box<Any>, then the associated code is never generated and never takes up any space in your compiled binary.
[1]: Frustratingly, this means that a type's TypeId can change value depending on precisely how the library gets compiled, to the point that compiling it as a dependency (as opposed to as a standalone build) causes TypeIds to change.
[2]: Insofar as I am aware. I could be wrong about this, but I'd be really surprised if that's the case.
[3]: This is generally true of generics in Rust.

Related

Rust, std::cell::Cell - get immutable reference to inner data

Looking through the documentation for std::cell::Cell, I don't see anywhere how I can retrieve a non-mutable reference to inner data. There is only the get_mut method: https://doc.rust-lang.org/std/cell/struct.Cell.html#method.get_mut
I don't want to use this function because I want to have &self instead of &self mut.
I found an alternative solution of taking the raw pointer:
use std::cell::Cell;
struct DbObject {
key: Cell<String>,
data: String
}
impl DbObject {
pub fn new(data: String) -> Self {
Self {
key: Cell::new("some_uuid".into()),
data,
}
}
pub fn assert_key(&self) -> &str {
// setup key in the future if is empty...
let key = self.key.as_ptr();
unsafe {
let inner = key.as_ref().unwrap();
return inner;
}
}
}
fn main() {
let obj = DbObject::new("some data...".into());
let key = obj.assert_key();
println!("Key: {}", key);
}
Is there any way to do this without using unsafe? If not, perhaps RefCell will be more practical here?
Thank you for help!
First of, if you have a &mut T, you can trivially get a &T out of it. So you can use get_mut to get &T.
But to get a &mut T from a Cell<T> you need that cell to be mutable, as get_mut takes a &mut self parameter. And this is by design the only way to get a reference to the inner object of a cell.
By requiring the use of a &mut self method to get a reference out of a cell, you make it possible to check for exclusive access at compile time with the borrow checker. Remember that a cell enables interior mutability, and has a method set(&self, val: T), that is, a method that can modify the value of a non-mut binding! If there was a get(&self) -> &T method, the borrow checker could not ensure that you do not hold a reference to the inner object while setting the object, which would not be safe.
TL;DR: By design, you can't get a &T out of a non-mut Cell<T>. Use get_mut (which requires a mut cell), or set/replace (which work on a non-mut cell). If this is not acceptable, then consider using RefCell, which can get you a &T out of a non-mut instance, at some runtime cost.
In addition to to #mcarton answer, in order to keep interior mutability sound, that is, disallow mutable reference to coexist with other references, we have three different ways:
Using unsafe with the possibility of Undefined Behavior. This is what UnsafeCell does.
Have some runtime checks, involving runtime overhead. This is the approach RefCell, RwLock and Mutex use.
Restrict the operations that can be done with the abstraction. This is what Cell, Atomic* and (the unstable) OnceCell (and thus Lazy that uses it) does (note that the thread-safe types also have runtime overhead because they need to provide some sort of locking). Each provides a different set of allowed operations:
Cell and Atomic* do not let you to get a reference to the contained value, and only replace it as whole (basically, get() and set, though convenience methods are provided on top of these, such as swap()). Projection (cell-of-slice to slice-of-cells) is also available for Cell (field projection is possible, but not provided as part of std).
OnceCell allows you to assign only once and only then take shared reference, guaranteeing that when you assign you have no references and while you have shared references you cannot assign anymore.
Thus, when you need to be able to take a reference into the content, you cannot choose Cell as it was not designed for that - the obvious choice is RefCell, indeed.

Pointers sent to function

I have following code in main():
msgs, err := ch.Consume(
q.Name, // queue
//..
)
cache := ttlru.New(100, ttlru.WithTTL(5 * time.Minute)) //Cache type
//log.Println(reflect.TypeOf(msgs)) 'chan amqp.Delivery'
go func() {
//here I use `cache` and `msgs` as closures. And it works fine.
}
I decided to create separate function for instead of anonymous.
I declared it as func hitCache(cache *ttlru.Cache, msgs *chan amqp.Delivery) {
I get compile exception:
./go_server.go:61: cannot use cache (type ttlru.Cache) as type *ttlru.Cache in argument to hitCache:
*ttlru.Cache is pointer to interface, not interface
./go_server.go:61: cannot use msgs (type <-chan amqp.Delivery) as type *chan amqp.Delivery in argument to hitCache
Question: How should I pass msg and cache into the new function?
Well, if the receiving variable or a function parameter expects a value
of type *T — that is, "a pointer to T",
and you have a variable of type T, to get a pointer to it,
you have to get the address of that variable.
That's because "a pointer" is a value holding an address.
The address-taking operator in Go is &, so you need something like
hitCache(&cache, &msgs)
But note that some types have so-called "reference semantics".
That is, values of them keep references to some "hidden" data structure.
That means when you copy such values, you're copying references which all reference the same data structure.
In Go, the built-in types maps, slices and channels have reference semantics,
and hence you almost never need to pass around pointers to the values of such types (well, sometimes it can be useful but not now).
Interfaces can be thought of to have reference semantics, too (let's not for now digress into discussing this) because each value of any interface type contains two pointers.
So, in your case it's better to merely not declare the formal parameters of your function as pointers — declare them as "plain" types and be done with it.
All in all, you should definitely complete some basic resource on Go which explains these basic matters in more detail and more extensively.
You're using pointers in the function signature but not passing pointers - which is fine; as noted in the comments, there is no reason to use pointers for interface or channel values. Just change the function signature to:
hitCache(cache ttlru.Cache, msgs chan amqp.Delivery)
And it should work fine.
Pointers to interfaces are nearly never used. You may simplify things and use interfaces of pass by value.

How to obtain a KType in Kotlin?

I'm experimenting with the reflection functionality in Kotlin, but I can't seem to understand how to obtain a KType value.
Suppose I have a class that maps phrases to object factories. In case of ambiguity, the user can supply a type parameter that narrows the search to only factories that return that type of object (or some sub-type).
fun mapToFactory(phrase: Phrase,
type: KType = Any::class): Any {...}
type needs to accept just about anything, including Int, which from my experience seems to be treated somewhat specially. By default, it should be something like Any, which means "do not exclude any factories".
How do I assign a default value (or any value) to type?
From your description, sounds like your function should take a KClass parameter, not a KType, and check the incoming objects with isSubclass, not isSubtype.
Types (represented by KType in kotlin-reflect) usually come from signatures of declarations in your code; they denote a broad set of values which functions take as parameters or return. A type consists of the class, generic arguments to that class, and nullability. The problem with types at runtime on JVM is that because of erasure, it's impossible to determine the exact type of a variable of a generic class. For example if you have a list, you cannot determine the generic type of that list at runtime, i.e. you cannot differentiate between List<String> and List<Throwable>.
To answer your initial question though, you can create a KType out of a KClass with createType():
val type: KType = Any::class.createType()
Note that if the class is generic, you need to pass type projections of generic arguments. In simple cases (all type variables can be replaced with star projections), starProjectedType will also work. For more info on createType and starProjectedType, see this answer.
Since Kotlin 1.3.40, you can use the experimental function typeOf<T>() to obtain the KType of any type:
val int: KType = typeOf<Int>()
In contrast to T::class.createType(), this supports nested generic arguments:
val listOfString: KType = typeOf<List<String>>()
The typeOf<T>() function is particularly useful when you want to obtain a KType from a reified type parameter:
inline fun <reified T> printType() {
val type = typeOf<T>()
println(type.toString())
}
Example usage:
fun main(args: Array<String>) {
printType<Map<Int, String>>()
// prints: kotlin.collections.Map<kotlin.Int, kotlin.String>
}
Since this feature is still in experimental status, you need to opt-in with #UseExperimental(ExperimentalStdlibApi::class) around your function that uses typeOf<T>(). As the feature becomes more stable (possibly in Kotlin 1.4), this can be omitted. Also, at this time it is only available for Kotlin/JVM, not Kotlin/Native or Kotlin/JS.
See also:
Release announcement
API Doc (very sparse currently)

Is it possible to declare a tuple struct whose members are private, except for initialization?

Is it possible to declare a tuple struct where the members are hidden for all intents and purposes, except for declaring?
// usize isn't public since I don't want users to manipulate it directly
struct MyStruct(usize);
// But now I can't initialize the struct using an argument to it.
let my_var = MyStruct(0xff)
// ^^^^
// How to make this work?
Is there a way to keep the member private but still allow new structs to be initialized with an argument as shown above?
As an alternative, a method such as MyStruct::new can be implemented, but I'm still interested to know if its possible to avoid having to use a method on the type since it's shorter, and nice for types that wrap a single variable.
Background
Without going into too many details, the only purpose of this type is to wrap a single type (a helper which hides some details, adds some functionality and is optimized away completely when compiled), in this context it's not exactly exposing hidden internals to use the Struct(value) style initializing.
Further, since the wrapper is zero overhead, its a little misleading to use the new method which is often associated with allocation/creation instead of casting.
Just as it's convenient type (int)v or int(v), instead of int::new(v), I'd like to do this for my own type.
It's used often, so the ability to use short expression is very convenient. Currently I'm using a macro which calls a new method, its OK but a little awkward/indirect, hence this question.
Strictly speaking this isn't possible in Rust.
However the desired outcome can be achieved using a normal struct with a like-named function (yes, this works!)
pub struct MyStruct {
value: usize,
}
#[allow(non_snake_case)]
pub fn MyStruct(value: usize) -> MyStruct {
MyStruct { value }
}
Now, you can write MyStruct(5) but not access the internals of MyStruct.
I'm afraid that such a concept is not possible, but for a good reason. Each member of a struct, unless marked with pub, is admitted as an implementation detail that should not raise to the surface of the public API, regardless of when and how the object is currently being used. Under this point of view, the question's goal reaches a conundrum: wishing to keep members private while letting the API user define them arbitrarily is not only uncommon but also not very sensible.
As you mentioned, having a method named new is the recommended approach of doing that. It's not like you're compromising code readability with the extra characters you have to type. Alternatively, for the case where the struct is known to wrap around an item, making the member public can be a possible solution. That, on the other hand, would allow any kind of mutations through a mutable borrow (thus possibly breaking the struct's invariants, as mentioned by #MatthieuM). This decision depends on the intended API.

When is it a good idea to return a pointer to a struct?

I'm learning Go, and I'm a little confused about when to use pointers. Specifically, when returning a struct from a function, when is it appropriate to return the struct instance itself, and when is it appropriate to return a pointer to the struct?
Example code:
type Car struct {
make string
model string
}
func Whatever() {
var car Car
car := Car{"honda", "civic"}
// ...
return car
}
What are the situations where I would want to return a pointer, and where I would not want to? Is there a good rule of thumb?
There are two things you want to keep in mind, performance and API.
How is a Car used? Is it an object which has state? Is it a large struct? Unfortunately, it is impossible to answer when I have no idea what a Car is. Truthfully, the best way is to see what others do and copy them. Eventually, you get a feeling for this sort of thing. I will now describe three examples from the standard library and explain why I think they used what they did.
hash/crc32: The crc32.NewIEEE() function returns a pointer type (actually, an interface, but the underlying type is a pointer). An instance of a hash function has state. As you write information to a hash, it sums up the data so when you call the Sum() method, it will give you the state of that one instance.
time: The time.Date function returns a Time struct. Why? A time is a time. It has no state. It is like an integer where you can compare them, preform maths on them, etc. The API designer decided that a modification to a time would not change the current one but make a new one. As a user of the library, if I want the time one month from now, I would want a new time object, not to change the current one I have. A time is also only 3 words in length. In other words, it is small and there would be no performance gain in using a pointer.
math/big: big.NewInt() is an interesting one. We can pretty much agree that when you modify a big.Int, you will often want a new one. A big.Int has no internal state, so why is it a pointer? The answer is simply performance. The programmers realized that big ints are … big. Constantly allocating each time you do a mathematical operation may not be practical. So, they decided to use pointers and allow the programmer to decide when to allocate new space.
Have I answered your question? Probably not. It is a design decision and you need to figure it out on a case by case basis. I use the standard library as a guide when I am designing my own libraries. It really all comes down to judgement and how you expect client code to use your types.
Very losely, exceptions are likely to show up in specific circumstances:
Return a value when it is really small (no more than few words).
Return a pointer when the copying overhead would substantially hurt performance (size is a lot of words).
Often, when you want to mimic an object-oriented style, where you have an "object" that stores state and "methods" that can alter the object, then you would have a "constructor" function that returns a pointer to a struct (think of it as the "object reference" as in other OO languages). Mutator methods would have to be methods of the pointer-to-the-struct type instead of the struct type itself, in order to change the fields of the "object", so it's convenient to have a pointer to the struct instead of a struct value itself, so that all "methods" will be in its method set.
For example, to mimic something like this in Java:
class Car {
String make;
String model;
public Car(String myMake) { make = myMake; }
public setMake(String newMake) { make = newMake; }
}
You would often see something like this in Go:
type Car struct {
make string
model string
}
func NewCar(myMake string) *Car {
return &Car{myMake, ""}
}
func (self *Car) setMake(newMake string) {
self.make = newMake
}

Resources