Assigning block pointers: differences between Objective-C vs C++ classes - pointers

I’ve found that assigning blocks behaves differently with respect to Objective-C class parameters and C++ classes parameters.
Imagine I have this simple Objective-C class hierarchy:
#interface Fruit : NSObject
#end
#interface Apple : Fruit
#end
Then I can write stuff like this:
Fruit *(^getFruit)();
Apple *(^getApple)();
getFruit = getApple;
This means that, with respect to Objective-C classes, blocks are covariant in their return type: a block which returns something more specific can be seen as a “subclass” of a block returning something more general. Here, the getApple block, which delivers an apple, can be safely assigned to the getFruit block. Indeed, if used later, it's always save to receive an Apple * when you're expecting a Fruit *. And, logically, the converse does not work: getApple = getFruit; doesn't compile, because when we really want an apple, we're not happy getting just a fruit.
Similarly, I can write this:
void (^eatFruit)(Fruit *);
void (^eatApple)(Apple *);
eatApple = eatFruit;
This shows that blocks are covariant in their argument types: a block that can process an argument that is more general can be used where a block that processes an argument that is more specific is needed. If a block knows how to eat a fruit, it will know how to eat an apple as well. Again, the converse is not true, and this will not compile: eatFruit = eatApple;.
This is all good and well — in Objective-C. Now let's try that in C++ or Objective-C++, supposing we have these similar C++ classes:
class FruitCpp {};
class AppleCpp : public FruitCpp {};
class OrangeCpp : public FruitCpp {};
Sadly, these block assignments don't compile any more:
FruitCpp *(^getFruitCpp)();
AppleCpp *(^getAppleCpp)();
getFruitCpp = getAppleCpp; // error!
void (^eatFruitCpp)(FruitCpp *);
void (^eatAppleCpp)(AppleCpp *);
eatAppleCpp = eatFruitCpp; // error!
Clang complains with an “assigning from incompatible type” error. So, with respect to C++ classes, blocks appear to be invariant in the return type and parameter types.
Why is that? Doesn't the same argument I made with Objective-C classes also hold for C++ classes? What am I missing?

This distinction is intentional, due to the differences between the Objective-C and C++ object models. In particular, given a pointer to an Objective-C object, one can convert/cast that pointer to point at a base class or a derived class without actually changing the value of the pointer: the address of the object is the same regardless.
Because C++ allows multiple and virtual inheritance, this is not the case for C++ objects: if I have a pointer to a C++ class and I cast/convert that pointer to point at a base class or a derived class, I may have to adjust the value of the pointer. For example, consider:
class A { int x; }
class B { int y; }
class C : public A, public B { }
B *getC() {
C *c = new C;
return c;
}
Let's say that the new C object in getC() gets allocated at address 0x10. The value of the pointer 'c' is 0x10. In the return statement, that pointer to C needs to be adjusted to point at the B subobject within C. Because B comes after A in C's inheritance list, it will (generally) be laid out in memory after A, so this means adding an offset of 4 bytes (
== sizeof(A)) to the pointer, so the returned pointer will be 0x14. Similarly, casting a B* to a C* would subtract 4 bytes from the pointer, to account for B's offset within C. When dealing with virtual base classes, the idea is the same but the offsets are no longer known, compile-time constants: they're accessed through the vtable during execution.
Now, consider the effect this has on an assignment like:
C (^getC)();
B (^getB)();
getB = getC;
The getC block returns a pointer to a C. To turn it into a block that returns a pointer to a B, we would need to adjust the pointer returned from each invocation of the block by adding 4 bytes. This isn't an adjustment to the block; it's an adjustment to the pointer value returned by the block. One could implement this by synthesizing a new block that wraps the previous block and performs the adjustment, e.g.,
getB = ^B() { return getC() }
This is implementable in the compiler, which already introduces similar "thunks" when overriding a virtual function with one that has a covariant return type needing adjustment. However, with blocks it causes an additional problem: blocks allow equality comparison with ==, so to evaluate whether "getB == getC", we would have to be able to look through the thunk that would be generated by the assignment "getB = getC" to compare the underlying block pointers. Again, this is implementable, but would require a much more heavyweight blocks runtime that is able to create (uniqued) thunks able to perform these adjustments to the return value (and as well as for any contravariant parameters). While all of this is technically possible, the cost (in runtime size, complexity, and execution time) outweighs the benefits.
Getting back to Objective-C, the single-inheritance object model never needs any adjustments to the object pointer: there's only a single address to point at a given Objective-C object, regardless of the static type of the pointer, so covariance/contravariance never requires any thunks, and the block assignment is a simple pointer assignment (+ _Block_copy/_Block_release under ARC).

the feature was probably overlooked. There are commits that show Clang people caring about making covariance and contravariance work in Objective-C++ for Objective-C types but I couldn't find anything for C++ itself. The language specification for blocks doesn't mention covariance or contravariance for either C++ or Objective-C.

Related

Specifying Referential transparency in ACSL

I want to find some ACSL annotation that can be applied to a function or function pointer to indicate that it has the property of referential transparency. Some way to say "this function will always return the same value when given the same arguments". So far I haven't found any such way. Can anyone point me to a way to express that?
Maybe some way to refer to an arbitrary logic function? If I could name an unknown logic boolean uknown_function(void* a, void* b) = /* this is unkown */; then I could document a function as having a postcondition that it's \result is equal to this arbitrary/unknown logic function?
The larger context is trying to do type-erased comparisons. I want to generally express the concept of "the user has given me void*s to work with and a bool (*)(void const*, void const*) to compare them with, and the user is guaranteeing to me that the function provided really is a strict partial order over whatever those pointers point to." If I had that, then I could start to describe properties of these type-erased objects being sorted, for example.
There is indeed no direct possibility to do that in ACSL: a function contract only specifies what happens during a single call of the function. You could indeed rely on a declared but left undefined logic function, with a reads clause that specifies the part of the C memory state that the function will need to compute its result, e.g.
/*# logic boolean unknown_function{L}(int* a, int* b) reads a[0 .. 1], b[2 .. 3]; */
but if you work with void *, without knowing the size of the underlying objects, this might be tricky to specify: unless the result of unknown_function relies solely on the value of the pointer, and not the content of the pointed object, in which case you don't need that reads trick.
Note in addition that contracts over function pointers are not supported yet, which will probably be an issue for what you intend to do if I understand correctly your last paragraph.
Finally, you might be interested in an upcoming plug-in, RPP, that proposes a way to specify, prove, and use properties relating several calls of one or more C function(s). It is described here and here, and a public release should happen in a not-too-distant future.

How does Rust implement reflection?

Rust has the Any trait, but it also has a "do not pay for what you do not use" policy. How does Rust implement reflection?
My guess is that Rust uses lazy tagging. Every type is initially unassigned, but later if an instance of the type is passed to a function expecting an Any trait, the type is assigned a TypeId.
Or maybe Rust puts a TypeId on every type that its instance is possibly passed to that function? I guess the former would be expensive.
First of all, Rust doesn't have reflection; reflection implies you can get details about a type at runtime, like the fields, methods, interfaces it implements, etc. You can not do this with Rust. The closest you can get is explicitly implementing (or deriving) a trait that provides this information.
Each type gets a TypeId assigned to it at compile time. Because having globally ordered IDs is hard, the ID is an integer derived from a combination of the type's definition, and assorted metadata about the crate in which it's contained. To put it another way: they're not assigned in any sort of order, they're just hashes of the various bits of information that go into defining the type. [1]
If you look at the source for the Any trait, you'll see the single implementation for Any:
impl<T: 'static + ?Sized > Any for T {
fn get_type_id(&self) -> TypeId { TypeId::of::<T>() }
}
(The bounds can be informally reduced to "all types that aren't borrowed from something else".)
You can also find the definition of TypeId:
pub struct TypeId {
t: u64,
}
impl TypeId {
pub const fn of<T: ?Sized + 'static>() -> TypeId {
TypeId {
t: unsafe { intrinsics::type_id::<T>() },
}
}
}
intrinsics::type_id is an internal function recognised by the compiler that, given a type, returns its internal type ID. This call just gets replaced at compile time with the literal integer type ID; there's no actual call here. [2] That's how TypeId knows what a type's ID is. TypeId, then, is just a wrapper around this u64 to hide the implementation details from users. If you find it conceptually simpler, you can just think of a type's TypeId as being a constant 64-bit integer that the compiler just knows at compile time.
Any forwards to this from get_type_id, meaning that get_type_id is really just binding the trait method to the appropriate TypeId::of method. It's just there to ensure that if you have an Any, you can find out the original type's TypeId.
Now, Any is implemented for most types, but this doesn't mean that all those types actually have an Any implementation floating around in memory. What actually happens is that the compiler only generates the actual code for a type's Any implementation if someone writes code that requires it. [3] In other words, if you never use the Any implementation for a given type, the compiler will never generate it.
This is how Rust fulfills "do not pay for what do you not use": if you never pass a given type as &Any or Box<Any>, then the associated code is never generated and never takes up any space in your compiled binary.
[1]: Frustratingly, this means that a type's TypeId can change value depending on precisely how the library gets compiled, to the point that compiling it as a dependency (as opposed to as a standalone build) causes TypeIds to change.
[2]: Insofar as I am aware. I could be wrong about this, but I'd be really surprised if that's the case.
[3]: This is generally true of generics in Rust.

What's the point of unique_ptr?

Isn't a unique_ptr essentially the same as a direct instance of the object? I mean, there are a few differences with dynamic inheritance, and performance, but is that all unique_ptr does?
Consider this code to see what I mean. Isn't this:
#include <iostream>
#include <memory>
using namespace std;
void print(int a) {
cout << a << "\n";
}
int main()
{
unique_ptr<int> a(new int);
print(*a);
return 0;
}
Almost exactly the same as this:
#include <iostream>
#include <memory>
using namespace std;
void print(int a) {
cout << a << "\n";
}
int main()
{
int a;
print(a);
return 0;
}
Or am I misunderstanding what unique_ptr should be used for?
In addition to cases mentioned by Chris Pitman, one more case you will want to use std::unique_ptr is if you instantiate sufficiently large objects, then it makes sense to do it in the heap, rather than on a stack. The stack size is not unlimited and sooner or later you might run into stack overflow. That is where std::unique_ptr would be useful.
The purpose of std::unique_ptr is to provide automatic and exception-safe deallocation of dynamically allocated memory (unlike a raw pointer that must be explicitly deleted in order to be freed and that is easy to inadvertently not get freed in the case of interleaved exceptions).
Your question, though, is more about the value of pointers in general than about std::unique_ptr specifically. For simple builtin types like int, there generally is very little reason to use a pointer rather than simply passing or storing the object by value. However, there are three cases where pointers are necessary or useful:
Representing a separate "not set" or "invalid" value.
Allowing modification.
Allowing for different polymorphic runtime types.
Invalid or not set
A pointer supports an additional nullptr value indicating that the pointer has not been set. For example, if you want to support all values of a given type (e.g. the entire range of integers) but also represent the notion that the user never input a value in the interface, that would be a case for using a std::unique_ptr<int>, because you could get whether the pointer is null or not as a way of indicating whether it was set (without having to throw away a valid value of integer just to use that specific value as an invalid, "sentinel" value denoting that it wasn't set).
Allowing modification
This can also be accomplished with references rather than pointers, but pointers are one way of doing this. If you use a regular value, then you are dealing with a copy of the original, and any modifications only affect that copy. If you use a pointer or a reference, you can make your modifications seen to the owner of the original instance. With a unique pointer, you can additionally be assured that no one else has a copy, so it is safe to modify without locking.
Polymorphic types
This can likewise be done with references, not just with pointers, but there are cases where due to semantics of ownership or allocation, you would want to use a pointer to do this... When it comes to user-defined types, it is possible to create a hierarchical "inheritance" relationship. If you want your code to operate on all variations of a given type, then you would need to use a pointer or reference to the base type. A common reason to use std::unique_ptr<> for something like this would be if the object is constructed through a factory where the class you are defining maintains ownership of the constructed object. For example:
class Airline {
public:
Airline(const AirplaneFactory& factory);
// ...
private:
// ...
void AddAirplaneToInventory();
// Can create many different type of airplanes, such as
// a Boeing747 or an Airbus320
const AirplaneFactory& airplane_factory_;
std::vector<std::unique_ptr<Airplane>> airplanes_;
};
// ...
void Airline::AddAirplaneToInventory() {
airplanes_.push_back(airplane_factory_.Create());
}
As you mentioned, virtual classes are one use case. Beyond that, here are two others:
Optional instances of objects. My class may delay instantiating an instance of the object. To do so, I need to use memory allocation but still want the benefits of RAII.
Integrating with C libraries or other libraries that love returning naked pointers. For example, OpenSSL returns pointers from many (poorly documented) methods, some of which you need to cleanup. Having a non-copyable pointer container is perfect for this case, since I can protect it as soon as it is returned.
A unique_ptr functions the same as a normal pointer except that you do not have to remember to free it (in fact it is simply a wrapper around a pointer). After you allocate the memory, you do not have to afterwards call delete on the pointer since the destructor on unique_ptr takes care of this for you.
Two things come to my mind:
You can use it as a generic exception-safe RAII wrapper. Any resource that has a "close" function can be wrapped with unique_ptr easily by using a custom deleter.
There are also times you might have to move a pointer around without knowing its lifetime explicitly. If the only constraint you know is uniqueness, then unique_ptr is an easy solution. You could almost always do manual memory management also in that case, but it is not automatically exception safe and you could forget to delete. Or the position you have to delete in your code could change. The unique_ptr solution could easily be more maintainable.

Do I need to use a weak pointer when using C++ `function` blocks (as opposed to Objective C blocks)

If you capture a strong reference to self under ARC in an objective-C style block, you need to use a __weak pointer to avoid an ARC "retain cycle" problem.
// Right way:
- (void)configureBlock {
XYZBlockKeeper * __weak weakSelf = self;
self.block = ^{
[weakSelf doSomething]; // capture the weak reference
// to avoid the reference cycle
}
}
I really don't know what a retain cycle is, but this answer describes it a bit. I just know you should use a __weak pointer for Objective-C style blocks. See Avoid Strong Reference Cycles when Capturing self.
But my question is, do I need to create a weak pointer when capturing self under a C++ <functional> block?
- (void)configureBlock {
self.block = [self](){
[self doSomething]; // is this ok? It's not an objective C block.
}
}
C++ lambdas can captured variables either by value or by reference (you choose when you declare the lambda how to capture each variable).
Capturing by reference is not interesting, because references to local variables become invalid after you leave the variable's scope anyway, so there is no memory management issues at all.
Capturing by value: if the captured variable is an Objective-C object pointer type, then it gets interesting. If you are using MRC, nothing happens. If you are using ARC, then yes, the lambda "retains" captured variables of object pointer type, as long as they are __strong (not __weak or __unsafe_unretained). So, yes, it would create a retain cycle.

Go Programming - bypassing access privileges using pointers

Let's say I have the following hierarchy for my project:
fragment/fragment.go
main.go
And in the fragment.go I have the following code, with one getter and no setter:
package fragment
type Fragment struct {
number int64 // private variable - lower case
}
func (f *Fragment) GetNumber() *int64 {
return &f.number
}
And in the main.go I create a Fragment and try to change Fragment.number without a setter:
package main
import (
"fmt"
"myproject/fragment"
)
func main() {
f := new(fragment.Fragment)
fmt.Println(*f.GetNumber()) // prints 0
//f.number = 8 // error - number is private
p := f.GetNumber()
*p = 4 // works. Now f.number is 4
fmt.Println(*f.GetNumber()) // prints 4
}
So by using the pointer, I changed the private variable outside of the fragment package. I understand that in for example C, pointers help to avoid copying large struct/arrays and they are supposed to enable you to change whatever they're pointing to. But I don't quite understand how they are supposed to work with private variables.
So my questions are:
Shouldn't the private variables stay private, no matter how they are accessed?
How is this compared to other languages such as C++/Java? Is it the case there too, that private variables can be changed using pointers outside of the class?
My Background: I know a bit C/C++, rather fluent in Python and new to Go. I learn programming as a hobby so don't know much about technical things happening behind the scenes.
You're not bypassing any access privilegies. If you acquire a *T from any imported package then you can always mutate *T, ie. the pointee at whole, as in an assignment. The imported package designer controls what you can get from the package, so the access control is not yours.
The restriction to what's said above is for structured types (structs), where the previous still holds, but the finer granularity of access control to a particular field is controlled by the field's name case even when referred to by a pointer to the whole structure. The field name must be uppercase to be visible outside its package.
Wrt C++: I believe you can achieve the same with one of the dozens C++ pointer types. Not sure which one, though.
Wrt Java: No, Java has no pointers. Not really comparable to pointers in Go (C, C++, ...).

Resources