How to verify number of function evaluations when profiling R code

How to verify number of function evaluations when profiling R code - r

When profiling R code with Rprof-type functions we get the time spent in function alone and the time spent in function and callees. However, as far as I know we don't get the number of times a given function was evaluated.
For example, assume I wants to compare two integration functions:
integrate_1(myfunc, from = -Inf, to = Inf)
integrate_2(myfunc, from = -Inf, to Inf)
I could easily see how much time each function takes and where this time was spent, but I don't know how to check how many times myfunc had to be evaluated in each of the integrate functions.
Thanks,

One way of implementing Joran's counter method is to use the trace function.
For example, first we set the counter to zero. (Assigned in the global environment, for convenience.)
count <- 0
Then set up the trace. Here we set it on the identity function (that just returns the value that you input to it).
trace("identity", quote(count <<- count + 1), print = FALSE)
Now whenever identity is called, the value of count is incremented. print = FALSE just stops a message being printed to the console when the function is called.
Let's call the function a few times and inspect the count:
for(i in seq_len(123)) identity(1)
count
## [1] 123

Rprof works by sampling the call stack on a timer. It does not count calls.
It records the sampled call stacks in a file, and though it does not record line numbers where calls occur, those samples are still useful for seeing what causes time to be spent.
For example, if you happen to look at M random samples, and you see a pattern like A calling B calling C on N of them, then you know the program spends roughly fraction N/M of its time doing that (assuming N > 1).
If you see such a thing, and you can think of a way to avoid even part of it, you will save a substantial fraction of the total time.
Rprof comes with a summarization tool that gives you the kind of numbers you mentioned, but I don't find those numbers useful anyway.
I would much rather get a real sense of what's happening.

Related

Seed limits in R

Consensus on set.seed in R is that that it effectively generates a long sequence of pseudo-random numbers, pre-determined by the seed. Then the first call you make to this sequence (with the first non-deterministic function you use) takes the first batch from that sequence, the second call takes the next batch, so forth.
I am wondering what the limits to this are. Specifically, what happens when you get to the end of that long sequence? Let's say, after setting a seed, you then sample from the first 100 integers repeatedly. Would there come a point where you start generating the same samples (in the same order) as you were seeing at the beginning? How long would this take? (Does it depend on the seed?) If not, how would reaching the 'end' of the sequence and presumably circling back to the beginning manifest?

The ?RNGkind help page in R gives more details on the default random number generator, the "Mersenne Twister" algorithm:
"Mersenne-Twister": From Matsumoto and Nishimura (1998); code
updated in 2002. A twisted GFSR with period 2^19937 - 1 and
equidistribution in 623 consecutive dimensions (over the
whole period). The ‘seed’ is a 624-dimensional set of 32-bit
integers plus a current position in that set.
As stated there, the "period" (the length of time it takes to get back to the beginning and start repeating values is 2^19937-1, or approximately 10^(19937/log2(10)) = 10^6001.
If the size of your "batches" happened to line up exactly with the period, then you would indeed start getting the same batches again.
I'm not sure how many pseudorandom samples R uses to pick a sample of size 1 from a set. Ideally it would be only 1 (so your "batch size" would be 1), but it might be more depending on the generality/complexity of the sampling algorithm.
I know that runif() translates more or less directly from the PRNG, so a sequence of runif() calls would indeed repeat exactly.

Ignoring errors in R

I'm running a complex but relatively quick simulation in R (takes about 5-10 minutes per simulation) and I'm beginning to run it in parallel with various input values in order to test the robustness of some of my algorithms.
There seems to be one problem: some arrangements of inputs cause a fatal error within the simulation and the whole code comes crashing down, causing the simulations to end. Is there an easy way to catch the error (which may come from a variety of locations) and have it just ignore those input values and move on to the next?
It's frustrating when I set an array of inputs to check that should take 5-6 hours to run through all the simulations and I come back to find that it crashed in the first 45 minutes.
While I work on trying to fix the bug / identify inputs that push me to that error, any ideas on how to ignore / catch the errors as they come?
Thanks

I don't know how did your organize your simulations, but I guess uu have a loop where you check use new arguments at each step.
You can use tryCatch . Here I am throwing an error if I have bad input.
step.simul <- function (x) {
stopifnot(x%%2 == 1, x>0)
(x - 1)/2
}
Then using tryCatch, I flag the bad inputs with a code
that tells you about the bad input:
sapply(1:5, function(i)tryCatch(step.simul(i), error=function(e)-1000-i))
[1] 0 -1002 1 -1004 2
As you see my simulations runs over all the loop index.

Convergence using for loop in R

I have the following code:
for(n in 1:1000){
..............
}
This will run ............ 1000 times. I havent put the full code in because its extremely long and not relevant to the answer
My question is there any way i can get the code to run until it reaches a specified convergence value to four decimal places. There are initial values being fed into this equation which generates new values and the process is continually iterative until a convergence attained (as specified above).
EDIT
I have a set of 4 values at the end of my code with different labels (A, B, C, D). Within my code there are two separate functions when each calculate different values and feed each other. So when i say convergence, i mean that when function 1 tells function 2 specific values and it calculates new values for A, B, C and D and the cycle continues and the next time these values are the same in as calculated by function 2
The key question im asking here is what format the code should take (the below would suggest that repeat is perferrable) and how to code the convergence criteria correctly as the assignment notation for successive iterations will be the same.

Just making an answer out of my comment, I think often repeat will be the best here. It doesn't require you to evaluate the condition at the start and doesn't stop after a finite number of iterations (unless of course that is what you want):
repeat
{
# Do stuff
if (condition) break
}

If you are just looking for a way of exiting for loops you can just use break.
for (n in 1:1000)
{
...
if (condition)
break;
}

You could always just use a while loop if you don't know how many iterations it will take. The general form could look something like this:
while(insert_convergence_check_here){
insert_your_code_here
}
Edit: In response to nico's comment I should add that you could also follow this pattern to essentially create a do/while loop in case you need the loop to run at least once before you can check the convergence criteria.
continue_indicator <- TRUE
while(continue_indicator){
insert_your_code_here
continue_indicator <- convergence_check_here
}

modifying an element of a list in-place in J, can it be done?

I have been playing with an implementation of lookandsay (OEIS A005150) in J. I have made two versions, both very simple, using while. type control structures. One recurs, the other loops. Because I am compulsive, I started running comparative timing on the versions.
look and say is the sequence 1 11 21 1211 111221 that s, one one, two ones, etc.
For early elements of the list (up to around 20) the looping version wins, but only by a tiny amount. Timings around 30 cause the recursive version to win, by a large enough amount that the recursive version might be preferred if the stack space were adequate to support it. I looked at why, and I believe that it has to do with handling intermediate results. The 30th number in the sequence has 5808 digits. (32nd number, 9898 digits, 34th, 16774.)
When you are doing the problem with recursion, you can hold the intermediate results in the recursive call, and the unstacking at the end builds the results so that there is minimal handling of the results.
In the list version, you need a variable to hold the result. Every loop iteration causes you to need to add two elements to the result.
The problem, as I see it, is that I can't find any way in J to modify an extant array without completely reassigning it. So I am saying
try. o =. o,e,(0&{y) catch. o =. e,(0&{y) end.
to put an element into o where o might not have a value when we start. That may be notably slower than
o =. i.0
.
.
.
o =. (,o),e,(0&{y)
The point is that the result gets the wrong shape without the ravels, or so it seems. It is inheriting a shape from i.0 somehow.
But even functions like } amend don't modify a list, they return a list that has a modification made to it, and if you want to save the list you need to assign it. As the size of the assigned list increases (as you walk the the number from the beginning to the end making the next number) the assignment seems to take more time and more time. This assignment is really the only thing I can see that would make element 32, 9898 digits, take less time in the recursive version while element 20 (408 digits) takes less time in the loopy version.
The recursive version builds the return with:
e,(0&{y),(,lookandsay e }. y)
The above line is both the return line from the function and the recursion, so the whole return vector gets built at once as the call gets to the end of the string and everything unstacks.
In APL I thought that one could say something on the order of:
a[1+rho a] <- new element
But when I try this in NARS2000 I find that it causes an index error. I don't have access to any other APL, I might be remembering this idiom from APL Plus, I doubt it worked this way in APL\360 or APL\1130. I might be misremembering it completely.
I can find no way to do that in J. It might be that there is no way to do that, but the next thought is to pre-allocate an array that could hold results, and to change individual entries. I see no way to do that either - that is, J does not seem to support the APL idiom:
a<- iota 5
a[3] <- -1
Is this one of those side effect things that is disallowed because of language purity?
Does the interpreter recognize a=. a,foo or some of its variants as a thing that it should fastpath to a[>:#a]=.foo internally?
This is the recursive version, just for the heck of it. I have tried a bunch of different versions and I believe that the longer the program, the slower, and generally, the more complex, the slower. Generally, the program can be chained so that if you want the nth number you can do lookandsay^: n ] y. I have tried a number of optimizations, but the problem I have is that I can't tell what environment I am sending my output into. If I could tell that I was sending it to the next iteration of the program I would send it as an array of digits rather than as a big number.
I also suspect that if I could figure out how to make a tacit version of the code, it would run faster, based on my finding that when I add something to the code that should make it shorter, it runs longer.
lookandsay=: 3 : 0
if. 0 = # ,y do. return. end. NB. return on empty argument
if. 1 ~: ##$ y do. NB. convert rank 0 argument to list of digits
y =. (10&#.^:_1) x: y
f =. 1
assert. 1 = ##$ y NB. the converted argument must be rank 1
else.
NB. yw =. y
f =. 0
end.
NB. e should be a count of the digits that match the leading digit.
e=.+/*./\y=0&{y
if. f do.
o=. e,(0&{y),(,lookandsay e }. y)
assert. e = 0&{ o
10&#. x: o
return.
else.
e,(0&{y),(,lookandsay e }. y)
return.
end.
)
I was interested in the characteristics of the numbers produced. I found that if you start with a 1, the numerals never get higher than 3. If you start with a numeral higher than 3, it will survive as a singleton, and you can also get a number into the generated numbers by starting with something like 888888888 which will generate a number with one 9 in it and a single 8 at the end of the number. But other than the singletons, no digit gets higher than 3.
Edit:
I did some more measuring. I had originally written the program to accept either a vector or a scalar, the idea being that internally I'd work with a vector. I had thought about passing a vector from one layer of code to the other, and I still might using a left argument to control code. With I pass the top level a vector the code runs enormously faster, so my guess is that most of the cpu is being eaten by converting very long numbers from vectors to digits. The recursive routine always passes down a vector when it recurs which might be why it is almost as fast as the loop.
That does not change my question.
I have an answer for this which I can't post for three hours. I will post it then, please don't do a ton of research to answer it.

assignments like
arr=. 'z' 15} arr
are executed in place. (See JWiki article for other supported in-place operations)
Interpreter determines that only small portion of arr is updated and does not create entire new list to reassign.
What happens in your case is not that array is being reassigned, but that it grows many times in small increments, causing memory allocation and reallocation.
If you preallocate (by assigning it some large chunk of data), then you can modify it with } without too much penalty.

After I asked this question, to be honest, I lost track of this web site.
Yes, the answer is that the language has no form that means "update in place, but if you use two forms
x =: x , most anything
or
x =: most anything } x
then the interpreter recognizes those as special and does update in place unless it can't. There are a number of other specials recognized by the interpreter, like:
199(1000&|#^)199
That combined operation is modular exponentiation. It never calculates the whole exponentiation, as
199(1000&|^)199
would - that just ends as _ without the #.
So it is worth reading the article on specials. I will mark someone else's answer up.

The link that sverre provided above ( http://www.jsoftware.com/jwiki/Essays/In-Place%20Operations ) shows the various operations that support modifying an existing array rather than creating a new one. They include:
myarray=: myarray,'blah'
If you are interested in a tacit version of the lookandsay sequence see this submission to RosettaCode:
las=: ,#((# , {.);.1~ 1 , 2 ~:/\ ])&.(10x&#.inv)#]^:(1+i.#[)
5 las 1
11 21 1211 111221 312211

Digging into R profiling information

I am trying to optimize a bit of code, and am puzzled about information from summaryRprof(). In particular, it looks like a number of calls are made to external C programs, but I'm not able to pin down which C program, from which R function. I am planning to resolve this through a bunch of slicing and dicing of the code, but wondered if I am overlooking some better way to interpret the profiling data.
The highest-consuming function is .Call, which is apparently a generic description for calls to C code; the next leading functions appear to be assignment operations:
$by.self
self.time self.pct total.time total.pct
".Call" 2281.0 54.40 2312.0 55.14
"[.data.frame" 145.0 3.46 218.5 5.21
"initialize" 123.5 2.95 217.5 5.19
"$<-.data.frame" 121.5 2.90 121.5 2.90
"as.vector" 110.5 2.64 416.0 9.92
I decided to focus on the .Call to see how this arises. I looked through the profiling file to find those entries with .Call in the call stack, and the following are the top entries in the call stack (by count of # of appearances):
13640 "eval"
11252 "["
7044 "standardGeneric"
4691 "<Anonymous>"
4658 "tryCatch"
4654 "tryCatchList"
4652 "tryCatchOne"
4648 "doTryCatch"
This list is as clear as mud: I have <Anonymous> and standardGeneric in there.
I believe this is due to calls to functions in the Matrix package, but that's because I'm looking at the code and that package appears to be the only possible source of C code. However, a lot of different functions from Matrix are called in this package, and it seems very difficult to determine which function is consuming this time.
So, my question is pretty basic: is there some way of deciphering and attributing these calls (e.g. .Call, <Anonymous>, etc.) in some other way? The plot of the call graph for this code is rather tricky to render, given the # of functions involved.
The fallback tactics I see are to either (1) comment out bits of code (and hack around to make the code work with this) to see where the time consumption occurs, or to (2) wrap certain operations inside of other functions and see when those functions appear on the call stack. The latter is inelegant, but it seems like it's the best way to add a tag to the call stack. The former is unpleasant because it takes quite some time to run the code, and iteratively uncommenting code and rerunning is unpleasant.

May I suggest you use the profr package. This is another bit of Hadley magic. It's a wrapper around Rprof and gives a visulation of the call stack and timings.
I find profr very easy to use and interpret. For example, here is a profile of a bit of ddply example code and the resulting profr plot:
library(profr)
p <- profr(
ddply(baseball, .(year), "nrow"),
0.01
)
plot(p)
You can immediately see the following:
How ddply calls ldply, llply and loop_apply.
Inside loop_apply there is a .Call function.
You can confirm this by reading the source code for loop_apply:
> plyr:::loop_apply
function (n, f, env = parent.frame())
{
.Call("loop_apply", as.integer(n), f, env)
}
<environment: namespace:plyr>
Edit. There is something very odd about the ggplot.profr method. I have proposed the following fix to Hadley. (You may wish to try this on your example.)
ggplot.profr <- function (data, ..., minlabel = 0.1, angle = 0){
if (!require("ggplot2", quiet = TRUE))
stop("Please install ggplot2 to use this plotting method")
data$range <- diff(range(data$time))
ggplot(as.data.frame(data), aes(y=level)) +
geom_rect(
#aes(xmin=(level), xmax=factor(level)+1, ymin=start, ymax=end),
aes(ymin=level-0.5, ymax=level+0.5, xmin=start, xmax=end),
#position = "identity", stat = "identity", width = 1,
fill = "grey95",
colour = "black", size = 0.5) +
geom_text(aes(label = f, x = start + range/60),
data = subset(data, time > max(time) * minlabel), size = 4, angle = angle, vjust=0.5, hjust = 0) +
scale_x_continuous("time") +
scale_y_continuous("level")
}

It seems that the short answer is "No" and the long answer is "Yes, but you're not going to enjoy this." Even answering this question is going to take some time (so stick around, I may be updating it).
There are several basic things to get one's head around when working with profiling in R:
First, there are many different ways to think about profiling. It is quite typical to think in terms of a call stack. At any given instant, this is the sequence of function calls that are active, essentially nested within each other (subroutines, if you will). This is quite useful for understanding the state of evaluations, where functions will return, and lots of other things that are important for seeing things as the computer / interpreter / OS may see them. Rprof does call stack profiling.
Second, a different perspective is that I've got a bunch of code and a particular call is taking a long time: which line in my code caused that call to be made? This is line profiling. R doesn't have line profiling, as far as I can tell. This is in contrast with Python and Matlab, which both have line profilers.
Third, the map from from lines to calls is surjective, but it is not bijective: given a particular call stack we cannot guarantee that we can map it back to the code. In fact, call stack analyses often summarize the calls completely out of context of the whole stack (i.e. cumulative times are reported no matter where that call was on all of the different stacks in which it occurred).
Fourth, even though we have these constraints, we can put on our statistical hats and analyze the call stack data carefully and see what we can make of it. The call stack information is data and we like data, don't we? :)
Just a quick intro to a call stack. Let's just assume that our call stack looked like this:
"C" "B" "A"
This means that function A called B which then calls C (the order is reversed), and the call stack is 3 levels deep. In my code, the call stack gets to as many as 41 levels deep. Since the stacks can be so deep and are presented in reverse order, this is more interpretable by software than by a human. Naturally, we begin cleaning and transforming this data. :)
Now, our data really comes along looking like:
".Call" "subCsp_cols" "[" "standardGeneric" "[" "eval" "eval" "callGeneric"
"[" "standardGeneric" "[" "myFunc2" "myFunc1" "eval" "eval" "doTryCatch"
"tryCatchOne" "tryCatchList" "tryCatch" "FUN" "lapply" "mclapply"
"<Anonymous>" "%dopar%"
Miserable, isn't it? It even has duplicates of things like eval, some guy called <Anonymous> - probably some darn hacker. (Anonymous is legion, by the way. :-))
The first step in transforming this into something useful was to split each line of Rprof() output and reverse the entries (via strsplit and rev). The first 12 entries (last 12 if you look at the raw call stack, rather than the post-rev version) were the same for every line (of which there were about 12000, the sampling interval was 0.5 seconds - so about 100 minutes of profiling), and these can be discarded.
Remember, we're still interested in knowing which line(s) led to .Call, which took so much time. Before we get to that question, we put on our statistical caps: the profiling reports, e.g. from summaryRprof, profr, ggplot, etc., only reflect the cumulative time spent for a given call or for calls beneath a given call. What does this cumulative information not tell us? Bingo: whether that call was made many times, or a few, and whether the time spent was constant over all invocations of that call or whether there are some outliers. A particular function might be executed 100 times or 100K times, but all of the cost may come from a single invocation (it shouldn't, but we don't know until we look at the data).
This only begins to describe the fun. The A->B->C example doesn't reflect the way things may really appear, such as A->B->C->D->B->E. Now, "B" may be counted a couple of times. What's more, suppose that a lot of time is spent in the C level, but we never sample at precisely that level, only seeing its child calls in the stack. We may see a sizable time for "total.time", but not for "self.time". If there are lots of different child calls under C, we may lose sight of what to optimize - should we take out C altogether or tweak the children, B, D, and E?
Just to account for the time spent, I took the sequences and ran them through digest, storing counts for the digested values, via hash. I also split up the sequences, storing {(A),(A,B), (A,B,C), etc.}. This doesn't seem so interesting, but removing singletons from the counts helps a lot in cleaning up the data. We can also store the time spent in each call by using rle(). This is useful for analyzing the distribution of time spent for a given call.
Still we're nowhere closer to finding the actual time spent per line of code. We'll never get lines of code from the call stack. A simpler way to do this is to store a list of times throughout the code, which stores the output of proc.time(), for a given invocation. Taking the difference of these times reveals which lines or sections of code are taking a long time. (Hint: that's what we're really looking for, not the actual calls.)
However, we have this call stack and we might as well do something useful. Going up the stack is somewhat interesting, but if we rewind the profile information to a little earlier, we can find which calls tend to precede the longer running calls. This allows us to look for landmarks in the call stack - positions where we can tie a call to a particular line of code. This makes it a bit easier to map more calls back to code, if all we have is the call stack, rather than instrumented code. (As I keep mentioning: out of context, there isn't a 1:1 mapping, but at a fine enough granularity, especially in repeatedly hit calls that are distinctive, you may be able to find landmarks in the calls that map to code.)
Altogether, I was able to find which calls were taking a lot of time, whether that was based on 1 long interval or many small ones, what the distribution of time spent was like, and, with some effort, I was able to map the most important & time consuming calls back to the code and discover which parts of the code could benefit the most from rewriting or from a change in algorithms.
Statistical analyses of the call stack is loads of fun, but investigating a particular call based on cumulative time consumption is not a very good way to go. The cumulative time consumed by a call is informative on a relative basis, but it doesn't enlighten us as whether one or many calls consumed this time, nor the depth of the call in the stack, nor the section of code responsible for the invocations. The first two things can be addressed via a bit more R code, while the latter is best pursued through instrumented code.
As R doesn't yet have line profilers like Python and Matlab, the simplest way to handle this is to just instrument one's code.

A line in a profile file might look like
"strsplit" ".parseTabix" ".readVcf" "readVcf" "standardGeneric" "readVcf" "system.time"
which says, reading right to left, that the outermost function was system.time, which invoked readVcf, which was an S4 generic that dispatched to a readVcf method, invoking a function .readVcf, which invoked .parseTabix, which finally called strsplit.
Here we read in the profile file, sort the lines, tally them up (using rle -- run length encoding), then select the six most common paths in the profile file
r = rle(sort(readLines("readVcf.Rprof"))
o = order(r$lengths, decreasing=TRUE)
r$values[head(o)]
This
r$lengths[head(o)]
tells us how many times each of those call stacks were sampled.
There are some common patterns that can help interpret this. Here's an S4 generic being dispatched to its method
"readVcf" "standardGeneric" "readVcf"
an lapply iterating over its function
"FUN" "lapply"
and a tryCatch surrounding a .Call
".Call" "doTryCatch" "tryCatchOne" "tryCatchList" "tryCatch"
Usually one tries to profile relatively small chunks of code, rather than a whole script, with the small chunk identified by, e.g., stepping through the code interactively or making some educated guesses about what parts are likely to be slow. The fact that .Call is the most commonly sampled function is not encouraging -- it suggests most of the time is already being spent in C. Likely your best bet will involve coming up with a better overall algorithm, rather than say a brute force approach.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex