Assigning Rcpp::XPtr on the R side - r

I'm learning about external pointers, XPtr, in Rcpp. I made the following test functions:
// [[Rcpp::export]]
Rcpp::XPtr<arma::Mat<int>> create_xptr(int i, int j) {
arma::Mat<int>* ptr(new arma::Mat<int>(i, j));
Rcpp::XPtr<arma::Mat<int>> p(ptr, true);
return p;
}
// [[Rcpp::export]]
void fill_xptr(Rcpp::XPtr<arma::Mat<int>>& xptr, int k) {
(*xptr).fill(k);
}
// [[Rcpp::export]]
arma::Mat<int> return_val(Rcpp::XPtr<arma::Mat<int>>& xptr) {
return *xptr;
}
Now on the R side I can of-course create an instance and work with it:
x <- create_xptr(1000, 1000) # (1)
Say, for some reason I accidentally called create_xptr again and assigned the result to x, i.e
x <- create_xptr(1000, 1000) # (2)
Then, I have no longer access to the pointer i created in (1) which makes sense and hence, I cannot free the memory. What I would like is, that the second time (2) it just overwrite the first one (1). And secondly, if I create an external pointer in some local scope (say a simple for loop), should the memory used then be freed automatically when it goes out of scope? I've tried the following:
for (i in 1:3) {
a <- create_xptr(1000, 100000)
fill_xptr(a, 1)
}
But it just adds to the memory usage for each i.
I have tried reading some code on different git-repos, old posts here on stack read a little about finalizers and garbage collection in R. I can't seem to put together the puzzle.

We use external pointers for things that do not have already existing interfaces such as database connections or objects from other new libraries. You may be better off looking at some existing uses of XPtr (or the other external pointer variants in some other packages, there are two small ones on CRAN).
And I don't think I can think of an example directly referencing this in R. It is mostly for "wrapping" external objects to hold on to them and to pass them around for further use elsewhere. And you are correct that you need to read up a little on finalizers. I find reading Writing R Extensions, as dense as it is, to be the best source because you need to get the initial "basics" in C right first.

Related

Good practice on how to store the result of a function for later use in R

I have the situation where I have written an R function, ComplexResult, that computes a computationally expensive result that two other separate functions will later use, LaterFuncA and LaterFuncB.
I want to store the result of ComplexResult somewhere so that both LaterFuncA and LaterFuncB can use it, and it does not need to be recalculated. The result of ComplexResult is a large matrix that only needs to be calculated once, then re-used later on.
R is my first foray into the world of functional programming, so interested to understand what it considered good practice. My first line of thinking is as follows:
# run ComplexResult and get the result
cmplx.res <- ComplexResult(arg1, arg2)
# store the result in the global environment.
# NB this would not be run from a function
assign("CachedComplexResult", cmplx.res, envir = .GlobalEnv)
Is this at all the right thing to do? The only other approach I can think of is having a large "wrapper" function, e.g.:
MyWrapperFunction <- function(arg1, arg2) {
cmplx.res <- ComplexResult(arg1, arg2)
res.a <- LaterFuncA(cmplx.res)
res.b <- LaterFuncB(cmplx.res)
# do more stuff here ...
}
Thoughts? Am I heading at all in the right direction with either of the above? Or is an there Option C which is more cunning? :)
The general answer is you should Serialize/deSerialize your big object for further use. The R way to do this is using saveRDS/readRDS:
## save a single object to file
saveRDS(cmplx.res, "cmplx.res.rds")
## restore it under a different name
cmplx2.res <- readRDS("cmplx.res.rds")
This assign to GlobalEnv:
CachedComplexResult <- ComplexResult(arg1, arg2)
To store I would use:
write.table(CachedComplexResult, file = "complex_res.txt")
And then to use it directly:
LaterFuncA(read.table("complex_res.txt"))
Your approach works for saving to local memory; other answers have explained saving to global memory or a file. Here are some thoughts on why you would do one or the other.
Save to file: this is slowest, so only do it if your process is volatile and you expect it to crash hard and you need to pick up the pieces where it left off, OR if you just need to save the state once in a while where speed/performance is not a concern.
Save to global: if you need access from multiple spots in a large R program.

Calling the original function after tweaking arguments is an incorrect use of the R-API?

I am trying to create a small extension for R here for embedding the current time on the R prompt: https://github.com/musically-ut/extPrompt
Things seem to be working overall, but R CMD check . raised a warning:
File '[truncated]..Rcheck/extPrompt/libs/extPrompt.so’:
Found non-API call to R: ‘ptr_R_ReadConsole’
Compiled code should not call non-API entry points in R.
The concerned file is this: https://github.com/musically-ut/extPrompt/blob/master/src/extPrompt.c and occurs on line 38, I think.
void extPrompt() {
// Initialize the plugin by replacing the R_ReadConsole function
old_R_ReadConsole = ptr_R_ReadConsole;
ptr_R_ReadConsole = extPrompt_ReadConsole;
// ...
}
int extPrompt_ReadConsole(const char *old_prompt, unsigned char *buf, int len,
int addtohistory) {
// ...
// Call the old function with the `new_prompt`
return (*old_R_ReadConsole)(new_prompt, buf, len, addtohistory);
}
I am trying to make the R_ReadConsole API call. However, since a different plugin (like mine) could have overridden it already, I do not want to directly invoke R_ReadConsole but the function which previously was at ptr_R_ReadConsole.
Is this an incorrect use of the API?
From the r-devel mailing list:
As for incorrectly using the API - yes. Since R CMD check is telling
you this is a problem, it is officially a problem. This is of no
consequence if the package works for you and any users of it, and
the package is not to be hosted on CRAN. However, using this symbol in
your code may not work on all platforms, and is not guaranteed to
work in the future (but probably will!).

What is the motivation behind this "pattern"?

I'm a bit confused when I see code such as:
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4 // correct...
bigBox.SmallBox.AnyMagicItem = true // also correct
Why, or when, would I want to do bigBox := &BigBox{} instead of bigBox := BigBox{} ? Is it more efficient in some way?
Code sample was taken from here.
Sample no.2:
package main
import "fmt"
type Ints struct {
x int
y int
}
func build_struct() Ints {
return Ints{0,0}
}
func build_pstruct() *Ints {
return &Ints{0,0}
}
func main() {
fmt.Println(build_struct())
fmt.Println(build_pstruct())
}
Sample no. 3: ( why would I go with &BigBox in this example, and not with BigBox as a struct directly ? )
func main() {
bigBox := &BigBox{}
bigBox.BubbleGumsCount = 4
fmt.Println(bigBox.BubbleGumsCount)
}
Is there ever a reason to call build_pstruct instead of the the build_struct variant? Isn't that why we have the GC?
I figured out one motivation for this kind of code: avoidance of "struct copying by accident".
If you use a struct variable to hold the newly created struct:
bigBox := BigBox{}
you may copy the struct by accident like this
myBox := bigBox // Where you just want a refence of bigBox.
myBox.BubbleGumsCount = 4
or like this
changeBoxColorToRed(bigBox)
where changeBoxColorToRed is
// It makes a copy of entire struct as parameter.
func changeBoxColorToRed(box bigBox){
// !!!! This function is buggy. It won't work as expected !!!
// Please see the fix at the end.
box.Color=red
}
But if you use a struct pointer:
bigBox := &BigBox{}
there will be no copying in
myBox := bigBox
and
changeBoxColorToRed(bigBox)
will fail to compile, giving you a chance to rethink the design of changeBoxColorToRed. The fix is obvious:
func changeBoxColorToRed(box *bigBox){
box.Color=red
}
The new version of changeBoxColorToRed does not copy the entire struct and works correctly.
bb := &BigBox{} creates a struct, but sets the variable to be a pointer to it. It's the same as bb := new(BigBox). On the other hand, bb := BigBox{} makes bb a variable of type BigBox directly. If you want a pointer (because perhaps because you're going to use the data via a pointer), then it's better to make bb a pointer, otherwise you're going to be writing &bb a lot. If you're going to use the data as a struct directly, then you want bb to be a struct, otherwise you're going to be dereferencing with *bb.
It's off the point of the question, but it's usually better to create data in one go, rather than incrementally by creating the object and subsequently updating it.
bb := &BigBox{
BubbleGumsCount: 4,
SmallBox: {
AnyMagicItem: true,
},
}
The & takes an address of something. So it means "I want a pointer to" rather than "I want an instance of". The size of a variable containing a value depends on the size of the value, which could be large or small. The size of a variable containing a pointer is 8 bytes.
Here are examples and their meanings:
bigBox0 := &BigBox{} // bigBox0 is a pointer to an instance of BigBox{}
bigBox1 := BigBox{} // bigBox1 contains an instance of BigBox{}
bigBox2 := bigBox // bigBox2 is a copy of bigBox
bigBox3 := &bigBox // bigBox3 is a pointer to bigBox
bigBox4 := *bigBox3 // bigBox4 is a copy of bigBox, dereferenced from bigBox3 (a pointer)
Why would you want a pointer?
To prevent copying a large object when passing it as an argument to a function.
You want to modify the value by passing it as an argument.
To keep a slice, backed by an array, small. [10]BigBox would take up "the size of BigBox" * 10 bytes. [10]*BigBox would take up 8 bytes * 10. A slice when resized has to create a larger array when it reaches its capacity. This means the memory of the old array has to be copied to the new array.
Why do you not what to use a pointer?
If an object is small, it's better just to make a copy. Especially if it's <= 8 bytes.
Using pointers can create garbage. This garbage has to be collected by the garbage collector. The garbage collector is a mark-and-sweep stop-the-world implementation. This means that it has to freeze your application to collect the garbage. The more garbage it has to collect, the longer that pause is. This individual, for example. experienced a pause up to 10 seconds.
Copying an object uses the stack rather than the heap. The stack is usually always faster than the heap. You really don't have to think about stack vs heap in Go as it decides what should go where, but you shouldn't ignore it either. It really depends on the compiler implementation, but pointers can result in memory going on the heap, resulting in the need for garbage collection.
Direct memory access is faster. If you have a slice []BigBox and it doesn't change size it can be faster to access. []BigBox is faster to read, whereas []*BigBox is faster to resize.
My general advice is use pointers sparingly. Unless you're dealing with a very large object that needs to be passed around, it's often better to pass around a copy on the stack. Reducing garbage is a big deal. The garbage collector will get better, but you're better off by keeping it as low as possible.
As always test your application and profile it.
The difference is between creating a reference object (with the ampersand) vs. a value object (without the ampersand).
There's a nice explanation of the general concept of value vs. reference type passing here... What's the difference between passing by reference vs. passing by value?
There is some discussion of these concepts with regards to Go here... http://www.goinggo.net/2013/07/understanding-pointers-and-memory.html
In general there is no difference between a &BigBox{} and BigBox{}. The Go compiler is free to do whatever it likes as long as the semantics are correct.
func StructToStruct() {
s := Foo{}
StructFunction(&s)
}
func PointerToStruct() {
p := &Foo{}
StructFunction(p)
}
func StructToPointer() {
s := Foo{}
PointerFunction(&s)
}
func PointerToPointer() {
p := &Foo{}
PointerFunction(p)
}
//passed as a pointer, but used as struct
func StructFunction(f *Foo) {
fmt.Println(*f)
}
func PointerFunction(f *Foo) {
fmt.Println(f)
}
Summary of the assembly:
StructToStruct: 13 lines, no allocation
PointerToStruct: 16 lines, no allocation
StructToPointer: 20 lines, heap allocated
PointerToPointer: 12 lines, heap allocated
With a perfect compiler the *ToStruct functions would be the identical as would the *ToPointer functions. Go's escape analysis is good enough to tell if a pointer escapes even across module boundries. Which ever way is most efficient is the way the compiler will do it.
If you're really into micro-optimization note that Go is most efficient when the syntax lines up with the semantics (struct used as a struct, pointer used as a pointer). Or you can just forget about it and declare the variable the way it will be used and you will be right most of the time.
Note: if Foo is really big PointerToStruct will heap allocate it. The spec threatens to that even StructToStruct is allowed to do this but I couldn't make it happen. The lesson here is that the compiler will do whatever it wants. Just as the details of the registers is shielded from the code, so is the state of the heap/stack. Don't change your code because you think you know how the compiler is going to use the heap.

Extract long[] from R object

I'm trying to make a wrapper for some C-based sparse-matrix-handling code (see previous question). In order to call the workhorse C function, I need to create a structure that looks like this:
struct smat {
long rows;
long cols;
long vals; /* Total non-zero entries. */
long *pointr; /* For each col (plus 1), index of first non-zero entry. */
long *rowind; /* For each nz entry, the row index. */
double *value; /* For each nz entry, the value. */
};
These correspond nicely to the slots in a dgCMatrix sparse matrix. So ideally I'd just point to the internal arrays in the dgCMatrix (after verifying that the C function won't twiddle with the data [which I haven't done yet]).
For *value, it looks like I'll be able to use REALSXP or something to get a double[] as desired. But for *pointr and *rowind, I'm not sure the best way to get at an appropriate array. Will I need to loop through the entries and copy them to new arrays, casting as I go? Or can Rcpp provide some sugar here? This is the first time I've really used Rcpp much and I'm not well-versed in it yet.
Thanks.
Edit: I'm also having some linking trouble that I don't understand:
Error in dyn.load(libLFile) :
unable to load shared object '/var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so':
dlopen(/var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so, 6): Symbol not found: __Z8svdLAS2AP4smatl
Referenced from: /var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so
Expected in: flat namespace
in /var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so
Do I need to be creating my library with some special compilation flags?
Edit 2: it looks like my libargs parameter has no effect, so libsvd symbols never make it into the library. I can find no way to include libraries using cxxfunction() - here's what I'd tried, but the extra parameters (wishful-thinkingly-borrowed from cfunction()) are silently gobbled up:
fn <- cxxfunction(sig=c(nrow="integer", mi="long", mp="long", mx="numeric"),
body=code,
includes="#include <svdlib.h>\n",
cppargs="-I/Users/u0048513/Downloads/SVDLIBC",
libargs="-L/Users/u0048513/Downloads/SVDLIBC -lsvd",
plugin="Rcpp",
verbose=TRUE)
I feel like I'm going about this whole process wrong, since nothing's working. Anyone kick me in the right direction?
I decided to also post a query on the Rcpp-devel mailing list, and got some good advice & help from Dirk and Doug:
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2011-February/001851.html
I'm still not super-facile with this stuff, but getting there. =)
I've done something similar for a [R]-Smalltalk-interface last year and went about it more generic to be able to pass all data back-and-forth by using byte-arrays:
In C i have:
DLLIMPORT void getLengthOfNextMessage(byte* a);
DLLIMPORT void getNextMessage(byte* a);
In R:
getLengthOfNextMessage <- function() {
tmp1 <- as.raw(rep(0,4))
tmp2<-.C("getLengthOfNextMessage", tmp1)
return(bvToInt(tmp2))
}
receiveMessage <- function() {
#if(getNumberOfMessages()==0) {
# print("error: no messages")
# return();
#}
tmp1<-as.raw(rep(0, getLengthOfNextMessage()+getSizeOfMessages()))
tmp2<-.C("getNextMessage", tmp1)
msg<-as.raw(tmp2[[1]])
print(":::confirm received")
print(bvToInt(msg[13:16]))
# confirmReceived(bvToInt(msg[13:16]))
return(msg)
}
I have commented-out the use of the functions getNumberOfMessages() and confirmReceived() which are specific to the problem i had to solve (multiple back-and-forth communication). Essentially, the code uses the argument byte-array to transfer the information, first the 4-byte-long length-info, then the actual data. This seems less elegant (even to me) than to use structs, but i found it to be more generic and i can hook into any dll, transfering any datatype.

How to reverse a QList?

I see qCopy, and qCopybackward but neither seems to let me make a copy in reverse order. qCopybackward only copies it in reverse order, but keeps the darn elements in the same order! All I want to do is return a copy of the list in reverse order. There has to be a function for that, right?
If you don't like the QTL, just use the STL. They might not have a Qt-ish API, but the STL API is rock-stable :) That said, qCopyBackward is just std::copy_backward, so at least they're consistent.
Answering your question:
template <typename T>
QList<T> reversed( const QList<T> & in ) {
QList<T> result;
result.reserve( in.size() ); // reserve is new in Qt 4.7
std::reverse_copy( in.begin(), in.end(), std::back_inserter( result ) );
return result;
}
EDIT 2015-07-21: Obviously (or maybe not), if you want a one-liner (and people seem to prefer that, looking at the relative upvotes of different answers after five years) and you have a non-const list the above collapses to
std::reverse(list.begin(), list.end());
But I guess the index fiddling stuff is better for job security :)
Reverse your QList with a single line:
for(int k = 0; k < (list.size()/2); k++) list.swap(k,list.size()-(1+k));
[Rewrite from original]
It's not clear if OP wants to know "How [do I] reverse a QList?" or actually wants a reversed copy. User mmutz gave the correct answer for a reversed copy, but if you just want to reverse the QList in place, there's this:
#include <algorithm>
And then
std::reverse(list.begin(), list.end());
Or in C++11:
std::reverse(std::begin(list), std::end(list));
The beauty of the C++ standard library (and templates in general) is that the algorithms and containers are separate. At first it may seem annoying that the standard containers (and to a lesser extent the Qt containers) don't have convenience functions like list.reverse(), but consider the alternatives: Which is more elegant: Provide reverse() methods for all containers, or define a standard interface for all containers that allow bidirectional iteration and provide one reverse() implementation that works for all containers that support bidirectional iteration?
To illustrate why this is an elegant approach, consider the answers to some similar questions:
"How do you reverse a std::vector<int>?":
std::reverse(std::begin(vec), std::end(vec));
"How do you reverse a std::deque<int>?":
std::reverse(std::begin(deq), std::end(deq));
What about portions of the container?
"How do you reverse the first seven elements of a QList?": Even if the QList authors had given us a convenience .reverse() method, they probably wouldn't have given us this functionality, but here it is:
if (list.size() >= 7) {
std::reverse(std::begin(list), std::next(std::begin(list), 7));
}
But it gets better: Because the iterator interface is the same as C pointer syntax, and because C++11 added the free std::begin() and std::end functions, you can do these:
"How do you reverse an array float x[10]?":
std::reverse(std::begin(x), std::end(x));
or pre C++11:
std::reverse(x, x + sizeof(x) / sizeof(x[0]));
(That is the ugliness that std::end() hides for us.)
Let's go on:
"How do you reverse a buffer float* x of size n?":
std::reverse(x, x + n);
"How do you reverse a null-terminated string char* s?":
std::reverse(s, s + strlen(s));
"How do you reverse a not-necessarily-null-terminated string char* s in a buffer of size n?":
std::reverse(s, std::find(s, s + n, '\0'));
Note that std::reverse uses swap() so even this will perform pretty much as well as it possibly could:
QList<QList<int> > bigListOfBigLists;
....
std::reverse(std::begin(bigListOfBigLists), std::end(bigListOfBigLists));
Also note that these should all perform as well as a hand-written loop since, when possible, the compiler will turn these into pointer arithmetic. Also, you can't cleanly write a reusable, generic, high-performance reverse function like this C.
#Marc Jentsch's answer is good. And if you want to get an additional 30% performance boost you can change his one-liner to:
for(int k=0, s=list.size(), max=(s/2); k<max; k++) list.swap(k,s-(1+k));
One a ThinkPad W520 with a QList of 10 million QTimers I got these numbers:
reversing list stack overflow took 194 ms
reversing list stack overflow with max and size took 136 ms
The boost is a result of
the expression (list.size()/2) being calculated only once when initializing the loop and not after every step
the expression list.size() in swap() is called only once when initializing the loop and not after every step
You can use the Java style iterator. Complete example here (http://doc.qt.digia.com/3.2/collection.html). Look for the word "reverse".
QList<int> list; // initial list
list << 1;
list << 2;
list << 3;
QList<int> rlist; // reverse list+
QListIterator<int> it(list);
while (it.hasPrevious()) {
rlist << it.previous();
}
Reversing a QList is going to be O(n) however you do it, since QList isn't guaranteed to have its data stored contiguously in memory (unlike QVector). You might consider just traversing the list in backwards order where you need to, or use something like a QStack which lets you retrieve the elements in the opposite order they were added.
For standard library lists it would look like this
std::list<X> result;
std::copy(list.rbegin(), list.rend(), std::back_inserter(result));
Unfortunately, Qt doesn't have rbegin and rend functions that return reverse iterators (the ones that go from the end of the container to its begnning). You may write them, or you can just write copying function on your own -- reversing a list is a nice excersize. Or you can note that QList is actually an array, what makes writing such a function trivial. Or you can convert the list to std::list, and use rbegin and rend. Choose whatever you like.
As of Qt 5.14 (circa 2020), QList provides a constructor that takes iterators, so you can just construct a reversed copy of the list with the reverse iterators of the source:
QList<int> backwards(forwards.rbegin(), forwards.rend());
Or if you want to be able to inline it, more generically (replace QList<I> with just I if you want to be super duper generic):
template <typename I> QList<I> reversed (const QList<I> &forwards) {
return QList<I>(forwards.rbegin(), forwards.rend());
}
Which lets you do fun one-liners with temporaries like:
QString badDay = "reporter covers whale exploding";
QString worseDay = reversed(badDay.split(' ')).join(' ');

Resources