Rcpp-Modules: Understanding .finalizer() and why it doesn't get called - r

I have written a Rcpp-Module named RMaxima that spawns a child process when it's constructor is called. My motivation for this question is, that I need an object of this class to be destructed, when R exits a functions execution environment in which it has been allocated and bound to a name.
Here are the important parts of my source file:
class RMaxima
{
public:
RMaxima()
{
...
// spawnes child process by instantiating an object
myMaxima = new Maxima(...);
}
~RMaxima()
{
// causes the child process to terminate properly
delete myMaxima;
}
...
private:
Maxima* myMaxima;
};
static void rmaxima_finalizer(RMaxima* ptr)
{
if (ptr)
{
delete ptr;
}
}
RCPP_MODULE(Maxima)
{
class_<RMaxima>("RMaxima")
.constructor()
.method(...)
.finalizer(&rmaxima_finalizer)
;
}
My understanding of garbage collection in R is that when the interpreter exits an environment the value bindings from this environment are thrown away and R's garbage collection frees up memory for any unbound values.
However, this seems to be different for Rcpp-Modules: If I call and exit a function that creates an instance of my class
foo <- function() {
m <- new(Rmaxima)
...
}
then, as expected, m does not appear in the global environment. However, the child process is still running. This means that my class' destructor/ finalizer has not been called. It is called however, when I quit the R session or when I install my corresponding custom package during which loading is tested.
Why? How can I cause it being destructed in a different scope? "Extending R" (Chambers, 2016) gave me some hint
The Rcpp templated type allows conversions to and from "externalptr"
with computations in the C++ code specialized to the type T of the object referred
to. Templated code can in fact use 3 parameters: type, storage scope and a final-
izer to be called when the object pointed to is deleted. The object returned to R
is in all cases of class and type "externalptr". No information is available about
the parametrized form.
as to point out that classes are returned as externalptr and protected from gc(), but I don't see the connection, since object is typeof(m): S4, i.e. a reference class.
Any further hint, where to read about this?

You are using Rcpp Modules which are in a way an alternative to using things yourself with an external pointer. Rcpp offers you Rcpp::XPtr and those are used by a few packages you could look at. (My favourite is a quick GitHub search as in this one across the 'cran' org that mirrors CRAN -- it shows over 1k code hits so some further triage needed.)
And XPtr itself seems to have an open issue in as far as guaranteed access to the finalizer is concerned; I filed issue #1108 on that last year and plan to revisit this 'soon'. But despite that issue which may be a corner case, in general this appears to work. So I would look into XPtr and/or the two helper packages for XPtr use on CRAN.
Lastly, Rcpp Classes were an extension written and contributed by John. They have not seen too much use though. Rcpp Modules is fairly widely used, and there too you could look at examples.
Lastly, you could also do something more simple and direct:
create a class Foo with an empty-ish constructor
have static pointer to a singleton of the class
create an init method to set up the class and have it hold the memory it needs
have setter/getter/worker/result methods as needed
create a teardown method to free the memory
and then call init first (maybe from package load), followed by all the work and then ensure the teardown is called. It's a far-from-perfect design (maybe teardown is not called?) but the simplicity gives you a comparison. And once you have the fragments you can build fancier structures on top.

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

When Qt-5 will fail the connect

Reading Qt signal & slots documentation, it seems that the only reason for a new style connection to fail is:
"If there is already a duplicate (exact same signal to the exact same slot on the same objects), the connection will fail and connect will return false"
Which means that connection was already successful the first time and does not allow multi-connections when using Qt::UniqueConnection.
Does this means that Qt-5 style connection will always success? Are there any other reasons for failure?
The new-style connect can still fail at runtime for a variety of reasons:
Either sender or receiver is a null pointer. Obviously this requires a check that can only happen at runtime.
The PMF you specified for a signal is not actually a signal. Lacking proper C++ reflection capabilities, all you can do at compile time is checking that the signal is a non-static member function of the sender's class.
However, that's not enough to make it a signal: it also needs to be in a signals: section in your class definition. When moc sees your class definition, it will generate some metadata containing the information that that function is indeed a signal. So, at runtime, the pointer passed to connect is looked up in a table, and connect itself will fail if the pointer is not found (because you did not pass a signal).
The check on the previous point actually requires a comparison between pointers to member functions. It's a particularly tricky one, because it will typically involve different TUs:
one is the TU containing moc-generated data (typically a moc_class.cpp file). In this TU there's the aforementioned table containing, amongst other things, pointers to the signals (which are just ordinary member functions).
is the TU where you actually invoke connect(sender, &Sender::signal, ...), which generates the pointer that gets looked up in the table.
Now, the two TUs may be in the same application, or perhaps one is in a library and the other in your application, or maybe in two libraries, etc; your platform's ABI starts to get into play.
In theory, the pointers stored when doing 1. are identical to the pointers generated when doing 2.; in practice, we've found cases where this does not happen (cf. this bug report that I reported some time ago, where older versions of GNU ld on ARM generated code that failed the comparison).
For Qt this meant disabling certain optimizations and/or passing some extra flags to the places where we know this to happen and break user software. For instance, as of Qt 5.9, there is no support for -Bsymbolic* flags on GCC on anything but x86 and x86-64.
Of course, this does not mean we've found and fixed all the possible places. New compilers and more aggressive optimizations might trigger this bug again in the future, making connect return false, even when everything is supposed to work.
Yes it can fail if either sender or receiver are not valid objects (nullptr for example)
Example
QObject* obj1 = new QObject();
QObject* obj2 = new QObject();
// Will succeed
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);
delete obj1;
obj1 = nullptr;
// Will fail even if it compiles
connect(obj1, &QObject::destroyed, obj2, &QObject::deleteLater);
Do not try to register pointer type. I've used the macro
#define QT_REG_TYPE(T) qRegisterMetaType<T>(#T)
with pointer type CMyWidget*, that was the problem. Using the type directly worked.
No it's not always successful. The docs give an example here where connect would return false because the signal should not contain variable names.
// WRONG
QObject::connect(scrollBar, SIGNAL(valueChanged(int value)),
label, SLOT(setNum(int value)));

Overriding non-visible S3 method within namespace

First I have a package called DataBaseLayer and it has an S3 method called LoadFromTable(data_request). Second there is another package called RiskCalculator which determines several types of risks and does requests to the database by means of the package DataBaseLayer. Before "triggering" RiskCalculator (by means of an execute function defined in it) a connection to some schema of the database is set up and the method LoadFromTable will refer to that particular schema.
For some tests that I need to perform I have to switch schema depending on the value in data_request that enters LoadFromTable(data_request). Thus what I actually need is to insert a little check in LoadFromTable. As a note, currently there is only a default method implemented, i.e. LoadFromTable.default, and it would thus suffice even to only insert that check in that specific method.
My question is thus twofold:
1. Is there a general way to insert a piece of code before any LoadFromTable method is called, naively said: to insert a piece of code just before UseMethod("LoadFromTable", data_request) is "called".
2. If there is no such way, can we at least insert a piece of code just before LoadFromTable.default is called (for in my case that would now suffice).
As a final note, I can imagine you might say that the whole structure should be changed, and I agree, however, that is not an option for I am not the owner of these packages.
Thanks for your help.
It’s strongly discouraged, and fundamentally the wrong approach, to change code in loaded packages, so I won’t discuss it here (but I’ll mention that this is done via the assignInNamespace function).
But your case can be solved much easier: just override the LoadFromTable generic function in the RiskCalculator package as follows:
LoadFromTable = function (request) {
# TODO: perform your check here.
DataBaseLayer::LoadFromTable(request)
}
Now if you load your RiskCalculator package and call the function either explicitly (via RiskCalculator::LoadFromTable) or implicitly (via LoadFromTable after attaching the RiskCalculator package), your implementation will be called.
Try trace:
library(DataBaseLayer)
trace(LoadFromTable, quote(print("Hello")))
The library statement is important, even if you don't otherwise access that package yourself.

Can Code be Protected From Rogue Callers In Ada?

I'm a fairly new Ada programmer. I have read the book by Barnes (twice I might add) and even managed to write a fair terminal program in Ada. My main language is C++ though.
I am currently wondering if there is a way to "protect" subroutine calls in Ada, perhaps in Ada 2012 (of which I know basically nothing). Let me explain what I mean (although in C++ terms).
Suppose you have a class Secret like this:
class Secret
{
private:
int secret_int;
public:
Set_Secret_Value( int i );
}
Now this is the usual stuff, dont expose secret_int, manipulate it only through access functions. However, the problem is that anybody with access to an object of type Secret can manipulate the value, whether that particular code section is supposed to do it or not. So the danger of rogue altering of secret_int has been reduced to anybody altering secret_int through the permitted functions, even if it happens in a code section that's not supposed to manipulate it.
To remedy that I came up with the following construct
class Secret
{
friend class Secret_Interface;
private:
int secret_int;
Set_Secret_Value( int i );
Super_Secret_Function();
};
class Secret_Interface
{
friend class Client;
private:
static Set_Secret_Value( Secret &rc_secret_object, int i )
{
rc_secret_object.Set_Secret( i );
}
};
class Client
{
Some_Function()
{
...
Secret_Interface::Set_Secret_Value( c_object, some-value );
...
}
}
Now the class Secret_Interface can determine which other classes can use it's private functions and by doing so, indirectly, the functions of class Secret that are exposed to Secret_Interface. This way class Secret still has private functions that can not be called by anybody outside the class, for instance function Super_Secret_Function().
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
Thanks for any help.
Edit:
I add a diagram here with a program structure like I have in mind that shows that what I mean here is a transport of a data structure across a wide area of the software, definition, creation and use of a record should happen in code sections that are otherwise unrleated
I think the key is to realize that, unlike C++ and other languages, Ada's primary top-level unit is the package, and visibility control (i.e. public vs. private) is on a per-package basis, not a per-type (or per-class) basis. I'm not sure I'm saying that correctly, but hopefully things will be explained below.
One of the main purposes of friend in C++ is so that you can write two (or more) closely related classes that both take part in implementing one concept. In that case, it makes sense that the code in one class would be able to have more direct access to the code in another class, since they're working together. I assume that in your C++ example, Secret and Client have that kind of close relationship. If I understand C++ correctly, they do all have to be defined in the same source file; if you say friend class Client, then the Client class has to be defined somewhere later in the same source file (and it can't be defined earlier, because at that point the methods in Secret or Secret_Interface haven't yet been declared).
In Ada, you can simply define the types in the same package.
package P is
type Secret is tagged private;
type Client is tagged private;
-- define public operations for both types
private
type Secret is tagged record ... end record;
type Client is tagged record ... end record;
-- define private operations for either or both types
end P;
Now, the body of P will contain the actual code for the public and private operations of both types. All code in the package body of P has access to those things defined in P's private part, regardless of which type they operate on. And, in fact, all code has access to the full definitions of both types. This means that a procedure that operates on a Client can call a private operation that operates on a Secret, and in fact it can read and write a Secret's record components directly. (And vice versa.) This may seem bizarre to programmers used to the class paradigm used by most other OOP languages, but it works fine in Ada. (In fact, if you don't need Secret to be accessible to anything else besides the implementation of Client, the type and its operations can be defined in the private part of P, or the package body.) This arrangement doesn't violate the principles behind OOP (encapsulation, information hiding), as long as the two types are truly two pieces of the implementation of one coherent concept.
If that isn't what you want, i.e. if Secret and Client aren't that closely related, then I would need to see a larger example to find out just what kind of use case you're trying to implement.
MORE THOUGHTS: After looking over your diagram, I think that the way you're trying to solve the problem is inferior design--an anti-pattern, if you will. When you write a "module" (whatever that means--a class or package, or in some cases two or more closely related classes or packages cooperating with each other), the module defines how other modules may use it--what public operations it provides on its objects, and what those operations do.
But the module (let's call it M1) should work the same way, according to its contract, regardless of what other module calls it, and how. M1 will get a sequence of "messages" instructing it to perform certain tasks or return certain information; M1 should not care where those messages are coming from. In particular, M1 should not be making decisions about the structure of the clients that use it. By having M1 decree that "procedure XYZ can only be called from package ABC", M1 is imposing structural requirements on the clients that use it. This, I believe, causes M1 to be too tightly coupled to the rest of the program. It is not good design.
However, it may make sense for the module that uses M1 to exercise some sort of control like that, internally. Suppose we have a "module" M2 that actually uses a number of packages as part of its implementation. The "main" package in M2 (the one that clients of M2 use to get M2 to perform its task) uses M1 to create a new object, and then passes that object to several other packages that do the work. It seems like a reasonable design goal to find a way that M2 could pass that object to some packages or subprograms without giving them the ability to, say, update the object, but pass it to other packages or subprograms that would have that ability.
There are some solutions that would protect against most accidents. For example:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_With_Updater is new Secret with null record;
procedure Dangerous_Operation (X : in out Secret_With_Updater);
end M1;
Now, the packages that could take a "Secret" object but should not have the ability to update it would have procedures defined with Secret'Class parameters. M2 would create a Secret_With_Updater object; since this object type is in Secret'Class, it could be passed as a parameter to procedures with Secret'Class parameters. However, those procedures would not be able to call Dangerous_Operation on their parameters; that would not compile.
A package with a Secret'Class parameter could still call the dangerous operation with a type conversion:
procedure P (X : in out Secret'Class) is
begin
-- ...
M1.Secret_With_Updater(X).Dangerous_Operation;
-- ...
end P;
The language can't prevent this, because it can't make Secret_With_Updater visible to some packages but not others (without using a child package hierarchy). But it would be harder to do this accidentally. If you really wish to go further and prevent even this (if you think there will be a programmer whose understanding of good design principles is so poor that they'd be willing to write code like this), then you could go a little further:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_Acc is access all Secret;
type Secret_With_Updater is tagged private;
function Get_Secret (X : Secret_With_Updater) return Secret_Acc;
-- this will be "return X.S"
procedure Dangerous_Operation (X : in out Secret_With_Updater);
private
-- ...
type Secret_With_Updater is tagged record
S : Secret_Acc;
end record;
-- ...
end M1;
Then, to create a Secret, M2 would call something that creates a Secret_With_Updater that returns a record with an access to a Secret. It would then pass X.Get_Secret to those procedures which would not be allowed to call Dangerous_Operation, but X itself to those that would be allowed. (You might also be able to declare S : aliased Secret, declare Get_Secret to return access Secret, and implement it with return X.S'access. This may avoid a potential memory leak, but it may also run into accessibility-check issues. I haven't tried this.)
Anyway, perhaps some of these ideas could help accomplish what you want to accomplish without introducing unnecessary coupling by forcing M1 to know about the structure of the application that uses it. It's hard to tell because your description of the problem, even with the diagram, is still at too abstract a level for me to see what you really want to do.
You could do this by using child packages:
package Hidden is
private
A : Integer;
B : Integer;
end Hidden;
and then
package Hidden.Client_A_View is
function Get_A return Integer;
procedure Set_A (To : Integer);
end Hidden.Client_A_View;
Then, Client_A can write
with Hidden.Client_A_View;
procedure Client_A is
Tmp : Integer;
begin
Tmp := Hidden.Client_A_View.Get_A;
Hidden.Client_A_View.Set_A (Tmp + 1);
end Client_A;
Your question is extremely unclear (and all the C++ code doesn't help explaining what you need), but if your point is that you want a type to have some publicly accessible operations, and some private operations, then it is easily done:
package Example is
type Instance is private;
procedure Public_Operation (Item : in out Instance);
private
procedure Private_Operation (Item : in out Instance);
type Instance is ... -- whatever you need it to be
end Example;
The procedure Example.Private_Operation is accessible to children of Example. If you want an operation to be purely internal, you declare it only in the package body:
package body Example is
procedure Internal_Operation (Item : in out Instance);
...
end Example;
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
If limited to language features, no.
Programmatically, code execution can be protected if the provider must be provided an approved "key" to allow execution of its services, and only authorized clients are supplied with such keys.
Devising the nature, generation, and security of such keys is left as an exercise for the reader.

Passing a managed class member to a c++ method

I have a C++ dll with the following method:
//C++ dll method (external)
GetServerInterface(ServerInterface* ppIF /*[OUT]*/)
{
//The method will set ppIF
}
//ServerInterface is defined as:
typedef void * ServerInterface;
To access the dll from a C# project, I created a C++/CLI project and declared a managed class as follows:
public ref class ComWrapperManager
{
//
//
ServerInterface _serverInterface;
void Connect();
//
//
}
I use the Connect() method to call GetServerInterface as shown below. The first call works, the second doesn't. Can someone explain why? I need to persist that pointer as a member variable in the managed class. Any better way to do this?
void Connect()
{
ServerInterface localServerInterface;
GetServerInterface(&localServerInterface); //THIS WORKS
GetServerInterface(&_serverInterface); //THIS DOESNT
//Error 1 error C2664: 'ServerInterface ' :
//cannot convert parameter 1 from //'cli::interior_ptr<Type>'
//to 'ServerInterface *'
}
You are passing a pointer to a member of a managed object. Such pointers are special, known as interior pointers. They are tracked by the garbage collector, it will modify the pointer value when the managed object is moved when the GC compacts the heap.
Problem is, you are passing that pointer to unmanaged code. The GC is not capable of modifying the copy of the pointer value that the native code is using. Now disaster strikes when another thread triggers a garbage collection, just when the native code is executing and dereferences the pointer. The object no longer exists at the original address. Very, very bad. And extremely hard to diagnose since it is so unlikely to happen.
The compiler can see you making this mistake. And complains with C2664.
The workaround is to pass a pointer that's stored in a memory location that's not going to get moved by the GC. Such a location is very easy to come by, a local variable qualifies. It is stored on the stack, it isn't going to be moved. So make it look like this instead:
void Connect()
{
ServerInterface temp;
GetServerInterface(&temp);
this->_serverInterface = temp;
// etc..
}
Which you already discovered yourself, just don't forget to assign the class member.
Here's why you can't do the second one: _serverInterface is a void pointer that is part of a managed class. Think about what the garbage collector does... It's allowed to move the managed objects around in memory however it wants, so the address of the void pointer can change from moment to moment. Therefore, it's not valid to use that address.
There are two solutions to this:
As you noted, where you can pass the address of a stack variable to the unmanaged method. Unlike managed objects, the stack doesn't move when the garbage collector does its thing, so the address doesn't change. You can then take the data stored in the stack variable and copy it to the class field, and that works fine, because you're not dealing with the address of it.
As the other answerer noted, you can lock your managed object in memory. Once it can't move, you can take the address of the void pointer field without issue. (He's showing C# syntax where you're looking for C++/CLI syntax. I'm not at a compiler to check, but I believe that the C++/CLI syntax is not the same.)
Of the two solutions, I prefer #1, the one you already have implemented: Solution #2 introduces a block of unmovable memory in the middle of the space that the garbage collector wants to rearrange. Given a choice, I prefer not to hamstring the garbage collector.

Resources