I'm trying to make a wrapper for some C-based sparse-matrix-handling code (see previous question). In order to call the workhorse C function, I need to create a structure that looks like this:
struct smat {
long rows;
long cols;
long vals; /* Total non-zero entries. */
long *pointr; /* For each col (plus 1), index of first non-zero entry. */
long *rowind; /* For each nz entry, the row index. */
double *value; /* For each nz entry, the value. */
};
These correspond nicely to the slots in a dgCMatrix sparse matrix. So ideally I'd just point to the internal arrays in the dgCMatrix (after verifying that the C function won't twiddle with the data [which I haven't done yet]).
For *value, it looks like I'll be able to use REALSXP or something to get a double[] as desired. But for *pointr and *rowind, I'm not sure the best way to get at an appropriate array. Will I need to loop through the entries and copy them to new arrays, casting as I go? Or can Rcpp provide some sugar here? This is the first time I've really used Rcpp much and I'm not well-versed in it yet.
Thanks.
Edit: I'm also having some linking trouble that I don't understand:
Error in dyn.load(libLFile) :
unable to load shared object '/var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so':
dlopen(/var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so, 6): Symbol not found: __Z8svdLAS2AP4smatl
Referenced from: /var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so
Expected in: flat namespace
in /var/folders/TL/TL+wXnanH5uhWm4RtUrrjE+++TM/-Tmp-//RtmpAA9upc/file2d4606aa.so
Do I need to be creating my library with some special compilation flags?
Edit 2: it looks like my libargs parameter has no effect, so libsvd symbols never make it into the library. I can find no way to include libraries using cxxfunction() - here's what I'd tried, but the extra parameters (wishful-thinkingly-borrowed from cfunction()) are silently gobbled up:
fn <- cxxfunction(sig=c(nrow="integer", mi="long", mp="long", mx="numeric"),
body=code,
includes="#include <svdlib.h>\n",
cppargs="-I/Users/u0048513/Downloads/SVDLIBC",
libargs="-L/Users/u0048513/Downloads/SVDLIBC -lsvd",
plugin="Rcpp",
verbose=TRUE)
I feel like I'm going about this whole process wrong, since nothing's working. Anyone kick me in the right direction?
I decided to also post a query on the Rcpp-devel mailing list, and got some good advice & help from Dirk and Doug:
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2011-February/001851.html
I'm still not super-facile with this stuff, but getting there. =)
I've done something similar for a [R]-Smalltalk-interface last year and went about it more generic to be able to pass all data back-and-forth by using byte-arrays:
In C i have:
DLLIMPORT void getLengthOfNextMessage(byte* a);
DLLIMPORT void getNextMessage(byte* a);
In R:
getLengthOfNextMessage <- function() {
tmp1 <- as.raw(rep(0,4))
tmp2<-.C("getLengthOfNextMessage", tmp1)
return(bvToInt(tmp2))
}
receiveMessage <- function() {
#if(getNumberOfMessages()==0) {
# print("error: no messages")
# return();
#}
tmp1<-as.raw(rep(0, getLengthOfNextMessage()+getSizeOfMessages()))
tmp2<-.C("getNextMessage", tmp1)
msg<-as.raw(tmp2[[1]])
print(":::confirm received")
print(bvToInt(msg[13:16]))
# confirmReceived(bvToInt(msg[13:16]))
return(msg)
}
I have commented-out the use of the functions getNumberOfMessages() and confirmReceived() which are specific to the problem i had to solve (multiple back-and-forth communication). Essentially, the code uses the argument byte-array to transfer the information, first the 4-byte-long length-info, then the actual data. This seems less elegant (even to me) than to use structs, but i found it to be more generic and i can hook into any dll, transfering any datatype.
Related
I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.
I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.
I'm learning about external pointers, XPtr, in Rcpp. I made the following test functions:
// [[Rcpp::export]]
Rcpp::XPtr<arma::Mat<int>> create_xptr(int i, int j) {
arma::Mat<int>* ptr(new arma::Mat<int>(i, j));
Rcpp::XPtr<arma::Mat<int>> p(ptr, true);
return p;
}
// [[Rcpp::export]]
void fill_xptr(Rcpp::XPtr<arma::Mat<int>>& xptr, int k) {
(*xptr).fill(k);
}
// [[Rcpp::export]]
arma::Mat<int> return_val(Rcpp::XPtr<arma::Mat<int>>& xptr) {
return *xptr;
}
Now on the R side I can of-course create an instance and work with it:
x <- create_xptr(1000, 1000) # (1)
Say, for some reason I accidentally called create_xptr again and assigned the result to x, i.e
x <- create_xptr(1000, 1000) # (2)
Then, I have no longer access to the pointer i created in (1) which makes sense and hence, I cannot free the memory. What I would like is, that the second time (2) it just overwrite the first one (1). And secondly, if I create an external pointer in some local scope (say a simple for loop), should the memory used then be freed automatically when it goes out of scope? I've tried the following:
for (i in 1:3) {
a <- create_xptr(1000, 100000)
fill_xptr(a, 1)
}
But it just adds to the memory usage for each i.
I have tried reading some code on different git-repos, old posts here on stack read a little about finalizers and garbage collection in R. I can't seem to put together the puzzle.
We use external pointers for things that do not have already existing interfaces such as database connections or objects from other new libraries. You may be better off looking at some existing uses of XPtr (or the other external pointer variants in some other packages, there are two small ones on CRAN).
And I don't think I can think of an example directly referencing this in R. It is mostly for "wrapping" external objects to hold on to them and to pass them around for further use elsewhere. And you are correct that you need to read up a little on finalizers. I find reading Writing R Extensions, as dense as it is, to be the best source because you need to get the initial "basics" in C right first.
I'm trying to package some code I use for data analysis so that other workers can use it. Currently, I'm stuck trying to write a simple function that imports data from a specific file type generated by a datalogger and trims it for use by other functions. Here's the code:
import<-function(filename,type="campbell",nprobes){
if (filename==TRUE){
if (type=="campbell"){
message("File import type is from Campbell CR1000")
flux.data<<-read.table(filename,sep=",",header=T,skip=1)
flux.data<<-flux.data[,-c(1,2)];flux.data<<-flux.data[-c(1,2),]
if (nprobes=="missing"){
nprobes<-32
}
flux.data<<-flux.data[,c(1:nprobes)]
flux.data.names<<-colnames(flux.data) #Saves column names
}
}
}
Ideally, the result would be a dataframe/matrix flux.data and a concomittant vector/list of the preserved column headers flux.data.names. The code runs and the function executes without errors, but the outputs aren't preserved. I usually use <<- to get around the function enclosure but its not working in this case - any suggestions?
I think the real problem is that I don't quite understand how enclosures work, despite a lot of reading... should I be using environment to assign environments within the function?
User joran answered my question in the comments above:
The critical issue was just in how the function was written: the conditional at the start (if filename==TRUE) was intended to see if filename was specified, and instead was checking to see if it literally equaled TRUE. The result was the conditional never being met, and no function output. Here's what fixed it:
import<-function(filename,type="campbell",nprobes){
if (exists(filename){
if (type=="campbell"){
#etc....
Another cool thing he pointed out was that I didn't need the <<- operator to utilize the function output and instead could write return(flux.data). This is a much more flexible approach, and helped me understand function enclosures a lot better.
I have the situation where I have written an R function, ComplexResult, that computes a computationally expensive result that two other separate functions will later use, LaterFuncA and LaterFuncB.
I want to store the result of ComplexResult somewhere so that both LaterFuncA and LaterFuncB can use it, and it does not need to be recalculated. The result of ComplexResult is a large matrix that only needs to be calculated once, then re-used later on.
R is my first foray into the world of functional programming, so interested to understand what it considered good practice. My first line of thinking is as follows:
# run ComplexResult and get the result
cmplx.res <- ComplexResult(arg1, arg2)
# store the result in the global environment.
# NB this would not be run from a function
assign("CachedComplexResult", cmplx.res, envir = .GlobalEnv)
Is this at all the right thing to do? The only other approach I can think of is having a large "wrapper" function, e.g.:
MyWrapperFunction <- function(arg1, arg2) {
cmplx.res <- ComplexResult(arg1, arg2)
res.a <- LaterFuncA(cmplx.res)
res.b <- LaterFuncB(cmplx.res)
# do more stuff here ...
}
Thoughts? Am I heading at all in the right direction with either of the above? Or is an there Option C which is more cunning? :)
The general answer is you should Serialize/deSerialize your big object for further use. The R way to do this is using saveRDS/readRDS:
## save a single object to file
saveRDS(cmplx.res, "cmplx.res.rds")
## restore it under a different name
cmplx2.res <- readRDS("cmplx.res.rds")
This assign to GlobalEnv:
CachedComplexResult <- ComplexResult(arg1, arg2)
To store I would use:
write.table(CachedComplexResult, file = "complex_res.txt")
And then to use it directly:
LaterFuncA(read.table("complex_res.txt"))
Your approach works for saving to local memory; other answers have explained saving to global memory or a file. Here are some thoughts on why you would do one or the other.
Save to file: this is slowest, so only do it if your process is volatile and you expect it to crash hard and you need to pick up the pieces where it left off, OR if you just need to save the state once in a while where speed/performance is not a concern.
Save to global: if you need access from multiple spots in a large R program.