I am often confronted with the following situation when I debug my Julia code:
I suspect that a certain variable (often a large matrix) deep inside my code is not what I intended it to be and I want to have a closer look at it. Ideally, I want to have access to it in the REPL so I can play around with it.
What is the best practice to get access to variables several function layers deep without passing them up the chain, i.e. changing the function returns?
Example:
function multiply(u)
v = 2*u
w = subtract(v)
return w
end
function subtract(x)
i = x-5
t = 10
return i-3t
end
multiply(10)
If I run multiply() and suspect that the intermediate variable i is not what I assume it should be, how would I gain access to it in the REPL?
I know that I could just write a test function and test that i has the intended properties right inside subtract(), but sometimes it would just be quicker to use the REPL.
This is the same in any programming language. You can use debugging tools like ASTInterpreter2 (which has good Juno integration) to step through your code and have an interactive REPL in the current environment, or you can use println debugging where you run the code with #show commands in there to print out values.
Related
Is this bad practice? It seems like a lot could go wrong here.*
I am setting the argument of an outer function to be a global variable for a function defined inside it. I am just doing this to work around some existing code.
f = function(a,b){h = function(c){print(b);b+c}}
myh = f(1,2)
myh(7)
#[1] 2
#[1] 9
*On the other hand, it's perfectly acceptable to write something like
h = function(c){print(7);7+c}
Creating a function that creates functions (or a function factory) is a totally acceptable code practice. See https://adv-r.hadley.nz/function-factories.html for more details on certain parts of the technical implementation in R.
It is most often used if you need to create functions at runtime or you need to create a lot of similar funcions.
The function factory you have created could be considered similar to a function factory that would create different sized counters that told the user how much the amount was incremented by.
It is important to keep track of the functions you create this way however.
Let me know if you'd like more clarification on anything.
(One possible bad practise in the function you have created though is an unused argument a).
In short: How can I call, from within Rccp C++ code, the agrep C internal function that gets called when users use the regular agrep function from base R?
In long: I have found multiple questions here about how to invoke, from within Rcpp, a C or C++ function created for another package (e.g. using C function from other package in Rcpp
and Rcpp: Call C function from a package within Rcpp).
The thing that I am trying to achieve, however, is at the same time simpler but also way less documented: it is to directly call, from within Rcpp, a .Internal C function that comes with base R rather than another package, without interfacing with R (that is, without doing what is said in Call R functions in Rcpp). How could I do that for the .Internal C function that lays underneath base R's agrep wrapper?
The specific function I am trying to call here is the agrep internal C function. And for context, what I am ultimately trying to achieve is to speed-up a call to agrep for when millions of patterns must be each checked against each of millions of x targets.
Great question. The long and short of it is "You cant" (in many cases) unless the function is visible in one of the header files in "src/include/". At least not that easily.
Not long ago I had a similar fun challenge, where I tried to get access to the do_docall function (called by do.call), and it is not a simple task. First of all, it is not directly possible to just #include <agrep.c> (or something similar). That file simply isn't available for inclusion, as it is not a part of the "src/include". It is compiled and the uncompiled file is removed (not to mention that one should never "include" a .c file).
If one is willing to go the mile, then the next step one could look at is "copying" and "altering" the source code. Basically find the function in "src/main/agrep.c", copy it into your package and then fix any errors you find.
Problems with this approach:
As documented in R-exts the internal structures of sexprec_info is not made public (this is the base structure for all objects in R). Many internal function use the fields within this structure, so one has to "copy" the structure into your source code, to make it public to your code specifically.
If you ever #include <Rcpp.h> prior to this file, you will need to go through each and every call to internal functions and likely add either R_ or Rf_.
The function may contain calls to other "internal" functions, that further needs to be copied and altered for it to work.
You will also need to get a clear understanding of what CDR, CAR and similar does. The internal functions have a documented structure, where the first argument contains the full call passed to the function, and function like those 2 are used to access parts of the call.
I did myself a solid and rewrote do_docall changing the input format, to avoid having to consider this. But this takes time. The alternative is to create a pairlist according to the documentation, set its type as a call-sexp (the exact name is lost to me at the moment) and pass the appropriate arguments for op, args and env.
And lastly, if you go through the steps, and find that it is necessary to copy the internal structures of sexprec_info (as described later), then you will need to be very careful about when you include Rinternals and Rcpp, as any one of these causes your code to crash and burn in the most beautiful and silent way if you include your header and these in the wrong order! Note that this even goes for [[Rcpp::export]], which may indeed turn out to include them in the wrong arbitrary order!
If you are willing to go this far down the drainage, I would suggest carefully reading adv-R "R's C interface" and Chapter 2, 5 and 6 of R-ext and maybe even the R internal manual, and finally once that is done take a look at do_docall from src/main/coerce.c and compare it to the implementation in my repository cmdline.arguments/src/utils/{cmd_coerce.h, cmd_coerce.c}. In this version I have
Added all the internal structures that are not public, so that I can access their unmodified form (unmodified by the current session).
This includes the table used to store the currently used SEXP's, that was used as a lookup. This caused a problem as I can't access the modified version, so my code is slightly altered with the old code blocked by the macro #if --- defined(CMDLINE_ARGUMENTS_MAYBE_IN_THE_FUTURE). Luckily the code causing a problem had a static answer, so I could work around this (but this might not always be the case).
I added quite a few Rf_s as their macro version is not available (since I #include <Rcpp.h> at some point)
The code has been split into smaller functions to make it more readable (for my own sake).
The function has one additional argument (name), that is not used in the internal function, with some added errors (for my specific need).
This implementation will be frozen "for all time to come" as I've moved on to another branch (and this one is frozen for my own future benefit, if I ever want to walk down this path again).
I spent a few days scouring the internet for information on this and found 2 different posts, talking about how this could be achieved, and my approach basically copies this. Whether this is actually allowed in a cran package, is an whole other question (and not one that I will be testing out).
This approach goes again if you want to use not-public code from other packages. While often here it is as simple as "copy-paste" their files into your repository.
As a final side note, you mention the intend is to "speed up" your code for when you have to perform millions upon millions of calls to agrep. It seems that this is a time where one should consider performing the task in parallel. Even after going through the steps outlined above, creating N parallel sessions to take care of K evaluations each (say 100.000), would be the first step to reduce computing time. Of course each session should be given a batch and not a single call to agrep.
I am a very new Julia user (coming from Matlab), so forgive me if I ask a very dumb question.
I currently have a julia code, which works (it runs fine) though it provides different results if I execute it as a function or if I run every of the function lines interactively.
My script is mostly about linear algebra and uses Arrays and Dicts.
As I have some trouble making use of the Juno debugger, I did not find another way to debug my code, which is quite a shame.
I spent the last three hours on this and I still have no clue why these results differ.
I suspect I don't understand some very basic working process of julia related to variable allocation but I'm flying blind here.
Does anyone have a explaination for this behavior ?
I can't provide the code here but here is the base structure of the code. Basically the M matrix returned by childfunction is wrong. a is a scalar a dict is a dictionary.
calling function
function motherfunction(...)
M = childfunction(a,dict)
end
child function
function childfunction(...)
...
M = *some linear algebra*
return M
end
Here is a piece of R code that writes to each element of a matrix in a reference class. It runs incredibly slowly, and I’m wondering if I’ve missed a simple trick that will speed this up.
nx = 2000
ny = 10
ref_matrix <- setRefClass(
"ref_matrix",fields = list(data = "matrix"),
)
out <- ref_matrix(data = matrix(0.0,nx,ny))
#tracemem(out$data)
for (iy in 1:ny) {
for (ix in 1:nx) {
out$data[ix,iy] <- ix + iy
}
}
It seems that each write to an element of the matrix triggers a check that involves a copy of the entire matrix. (Uncommenting the tracemen() call shows this.) Now, I’ve found a discussion that seems to confirm this:
https://r-devel.r-project.narkive.com/8KtYICjV/rd-copy-on-assignment-to-large-field-of-reference-class
and this also seems to be covered by Speeding up field access in R reference classes
but in both of these this behaviour can be bypassed by not declaring a class for the field, and this works for the example in the first link which uses a 1D vector, b, which can just be set as b <<- 1:10000. But I’ve not found an equivalent way of creating a 2D array without using a explicit “matrix” instance.
Am I just missing something simple, or is this actually not possible?
Let me add a couple of things. First, I’m very new to R, so could easily have missed something. Second, I’m really just curious about the way reference classes work in this case and whether there’s a simple way to use them efficiently; I’m not looking for a really fast way to set the elements of a matrix - I can do that by not having the matrix in a reference class at all, and if I really care about speed I can write a C routine to do it and call it from R.
Here’s some background that might explain why I’m interested in this, which you’re welcome to ignore.
I got here by wanting to see how different languages, and even different compiler options and different ways of coding the same operation, compared for efficiency when accessing 2D rectangular arrays. I’ve been playing with a test program that creates two 2D arrays of the same size, and calls a subroutine that sets the first to the elements of the second plus their index values. (Almost any operation would do, but this one isn’t completely trivial to optimise.) I have this in a number of languages now, C, C++, Julia, Tcl, Fortran, Swift, etc., even hand-coded assembler (spoiler alert: assembler isn’t worth the effort any more) and thought I’d try R. The obvious implementation in R passes the two arrays to a subroutine that does the work, but because R doesn’t normally pass by reference, that routine has to make a copy of the modified array and return that as the function value. I thought using a reference class would avoid the relatively minor overhead of that copy, so I tried that and was surprised to discover that, far from speeding things up, it slowed them down enormously.
Use outer:
out$data <- outer(1:ny, 1:nx, `+`)
Also, don't use reference classes (or R6 classes) unless you actually need reference semantics. KISS and all that.
I've written an experimental function evaluator that allows me to bind simple functions together such that when the variables change, all functions that rely on those variables (and the functions that rely on those functions, etc.) are updated simultaneously. The way I do this is instead of evaluating the function immediately as it's entered in, I store the function. Only when an output value is requested to I evaluate the function, and I evaluate it each and every time an output value is requested.
For example:
pi = 3.14159
rad = 5
area = pi * rad * rad
perim = 2 * pi * rad
I define 'pi' and 'rad' as variables (well, functions that return a constant), and 'area' and 'perim' as functions. Any time either 'pi' or 'rad' change, I expect the results of 'area' and 'perim' to change in kind. Likewise, if there were any functions depending on 'area' or 'perim', the results of those would change as well.
This is all working as expected. The problem here is when the user introduces recursion - either accidental or intentional. There is no logic in my grammar - it's simply an evaluator - so I can't provide the user with a way to 'break out' of recursion. I'd like to prevent it from happening at all, which means I need a way to detect it and declare the offending input as invalid.
For example:
a = b
b = c
c = a
Right now evaluating the last line results in a StackOverflowException (while the first two lines evaluate to '0' - an undeclared variable/function is equal to 0). What I would like to do is detect the circular logic situation and forbid the user from inputing such a statement. I want to do this regardless of how deep the circular logic is hidden, but I have no idea how to go about doing so.
Behind the scenes, by the way, input strings are converted to tokens via a simple scanner, then to an abstract syntax tree via a hand-written recursive descent parser, then the AST is evaluated. The language is C#, but I'm not looking for a code solution - logic alone will be fine.
Note: this is a personal project I'm using to learn about how parsers and compilers work, so it's not mission critical - however the knowledge I take away from this I do plan to put to work in real life at some point. Any help you guys can provide would be appreciated greatly. =)
Edit: In case anyone's curious, this post on my blog describes why I'm trying to learn this, and what I'm getting out of it.
I've had a similar problem to this in the past.
My solution was to push variable names onto a stack as I recursed through the expressions to check syntax, and pop them as I exited a recursion level.
Before I pushed each variable name onto the stack, I would check if it was already there.
If it was, then this was a circular reference.
I was even able to display the names of the variables in the circular reference chain (as they would be on the stack and could be popped off in sequence until I reached the offending name).
EDIT: Of course, this was for single formulae... For your problem, a cyclic graph of variable assignments would be the better way to go.
A solution (probably not the best) is to create a dependency graph.
Each time a function is added or changed, the dependency graph is checked for cylces.
This can be cut short. Each time a function is added, or changed, flag it. If the evaluation results in a call to the function that is flagged, you have a cycle.
Example:
a = b
flag a
eval b (not found)
unflag a
b = c
flag b
eval c (not found)
unflag b
c = a
flag c
eval a
eval b
eval c (flagged) -> Cycle, discard change to c!
unflag c
In reply to the comment on answer two:
(Sorry, just messed up my openid creation so I'll have to get the old stuff linked later...)
If you switch "flag" for "push" and "unflag" for "pop", it's pretty much the same thing :)
The only advantage of using the stack is the ease of which you can provide detailed information on the cycle, no matter what the depth. (Useful for error messages :) )
Andrew