How to pair events from different recordings? - math

I have what I think to be an optimization problem that I have already solved. I think, however, that there might be a more robust solution which I don't quite know how to implement. Maybe someone here has some ideas.
Problem Description
I need to pair events from two separate recordings. An event has a name, a time t at which it occurred and an amplitude a with the constraint that the next event occurs at t_next = t + a. The two recordings should produce events at the same time and amplitude, but there are recording errors to deal with like drift.
The task is to group the same events into pairs of 2.
Here is an example:
Event Name t a t a
39770 155648.16 41.96 154726.4 41.75
39780 155690.12 42.01 154768.15 41.78
39790 155732.13 41.24 154809.94 41.13
39800 154851.06 5.74
39810 155773.37 1.55
39815 155774.03 1.55
39820 155774.92 3.75
39830 155778.67 0.65
39840 155779.32 22.35 154856.81 22.24
39850 155801.67 1.36 154879.04 1.27
39855 155802.4 1.36 154879.66 1.27
39860 155803.04 12.79 154880.31 12.74
As you can see, the t and a values are not exactly the same within a pair (same row), so I have to work with tolerances. Sometimes, you have events in recording 1 what you don't have in recording 2 and vice verse. So how can I pair them?
Idea For Solution
I was thinking of a "rubberband" approach somehow. Basically, you group the events by amplitude within the allowed tolerance such that you maximize the number of paired up events. But I'm not sure how to deal with missing events in one recording vs the other. Somehow, I have to ignore them and I'm not sure how to best do that without this problem exploding due to exorbitant combinations to try.
Has anyone come across this kind of problem and knows a robust solution for it?

Related

Understanding Event Coincidence Analysis from CoinCalc R package

I have two binarized events (eventA and eventB), I want to know if there is any coincidence in these two events. So I'll use the new Package CoinCalc to investigate the potential relation between these two.
library(CoinCalc) #note that the package is not visible (at least for) me in CRAN. I got it from GitHub https://github.com/JonatanSiegmund/CoinCalc
two binary events
eventA= c(0,1,0,0,1,1,0,0,1,1,0,0,1,0,1,0,1,0,1,1,1,1,0,0,0,1,1,1,0,0,1,1,0,1,1,0,1,0,0,0,1,1,0,0,0,1,1,0,1,1,1,1,1,1,0,1,0,0,0,1,1,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,0,1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,1,1,0,0,1,1,0,1,1,0,0,0,1,0,0,0,0,0,1,1,0,1,0,1,1,0,1,0,0,0,1,0,0,1,0,1)
eventB = c(0,1,0,0,0,0,1,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,1,0,1,1,0,1,1,1,0,0,1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,0,1,1,1,0,0,1,0,1,1,1,1,1,1,0,0,1,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,1,0,0,1,0,1,0,1,1,0,0,0)
run ECA analysis
ca.out <- CC.eca.ts(eventA, eventB,delT=2,tau=2)
this yields:
$NH precursor
1 TRUE
$NH trigger
1 FALSE
$p-value precursor
1 0.2544052
$p-value trigger
1 0.003287963
$precursor coincidence rate
1 0.8243243
$trigger coincidence rate
1 0.9285714
I want to make sure I'm understanding this properly. Based on the results, the null hypothesis can only be rejected for the trigger, which is statistically significant at the 0.003 level, and the coincidence rate is 0.92 (very high, is this equivalent to R2?). Can this be interpreted that eventB has a strong influence on eventA, but not the opposite?
Then I can plot these two events using the CC.plot function:
CC.plot(eventA,eventB,dates=c(1900:2040),delT=2, tau=2, seriesAname = 'EventA', seriesBname = 'EventB')
Which yields:
Is there any way to modify the graphical parameters in CC.plot? The dummy years are not visible in this plot. I'd like to change fonts, size, colours, etc. Is there any way to draw the same figure by calling the model output (ca.out)?
Thanks in advance!
I'll try to answer your questions:
Question #1: The most important problem that I see in your example is that your events are not "rare". Therefore the most important pre-condition of the analytical significance test that you used by default (sigtest="poisson") in not fulfilled. Another "problem" is, that the events in both series seem to be clustered (may also be an effect of the high number of events). I would recommend to use sigtest="shuffle.surrogate" which is more appropriate for this case. More information about the significance test can be found at Siegmund et al. 2017 (http://www.sciencedirect.com/science/article/pii/S0098300416305489)
Executing this reveals that both coincidence rates are not significant. By the way: with such a high number of events it is extremely unlikely that you would ever get a 'significant coincidence rate', because the chance that simultaneities occur by random is very very high.
Nevertheless, if the trigger coincidence rate would be significant and the precursor not, your interpretation is a possible one.
Question #2: The problem with the plot is again, that there are too many events (compared to what the method was originally designed for). This is why everything looks so messy. The function was ment to be more like a help to explain how the method works and what you have done.
If you e.g. only plot e.g. 20 years of your data
CC.plot(eventA[120:140],eventB[120:140],dates=c(2020:2040),delT=2, tau=2, seriesAname = 'EventA', seriesBname = 'EventB')
you will get a much better image, that yet, due to the high event-density of almost 50%, is not very nice.
CoinCalc plot
For now, there are no options to change the plot parameters. This might come for a future version of the package.
I hope that this helps you a bit!

Computer Adaptive Testing 1PL Ability Calculation Math: How to implement?

Preamble:
I have been implementing my own CAT system. The resources that have helped me most are these:
An On-line, Interactive, Computer Adaptive Testing Tutorial, 11/98 -- A good explanation of how to pick a test question based on which one would return the most information. Fascinating idea, really. The equations are not illustrated with examples, however... but there is a simulation to play with. Unfortunately the simulation is down!
Computer-Adaptive Testing: A Methodology Whose Time Has Come -- This has similar equations, although it does not use IRT or the Newton-Raphson Method. It is also Rasch, not 3PL. It does, however, have a BASIC program that is far more explicit than the usual equations that are cited. I have converted portions of the program in order to get my own system to experiment with, but I would prefer to use 1PL and/or 3PL.
Rasch Dichotomous Model vs. One-parameter Logistic Model -- This clears some stuff up, but perhaps only makes me more dangerous at this stage.
Now, the question.
I want to be able to measure someone's ability level based on a series of questions that are rated at a 1PL difficulty level and of course the person's answers and whether or not they are correct.
I have to first have a function that calculates the probably of a given item. This equation gives the probability function for 1PL.
Probability correct = e^(ability - difficulty) / (1+ e^(ability - difficulty))
I'll go with this one arbitrarily for now. Using an ability estimate of 0, we get the following probabilities:
-0.3 --> 0.574442516811659
-0.2 --> 0.549833997312478
-0.1 --> 0.52497918747894
0 --> 0.5
0.1 --> 0.47502081252106
0.2 --> 0.450166002687522
0.3 --> 0.425557483188341
This makes sense. A problem targeting their level is 50/50... and the questions are harder or easier depending on which direction you go. The harder questions have a smaller chance of coming out correct.
Now... consider a test taker that has done five questions at this difficulty: -.1, 0, .1, .2, .1. Assume they got them all correct except the one that's at difficulty .2. Assuming an ability level of 0... I would want some equations to indicate that this person is slightly above average.
So... how to calculate that with 1PL? This is where it gets hard.
Looking at the equations on the various pages... I will start with an assumed ability level... and then gradually adjust it with each question after more or less like the following.
Starting Ability: B0 = 0
Ability after problem 1: B1 = B0 + [summations and function evaluated for item 1 at ability B0]
Ability after problem 2: B2 = B1 + [summations and functions evaluated for items 1-2 at ability B1]
Ability after problem 3: B3 = B2 + [summations and functions evaluated for items 1-3 at ability B2]
Ability after problem 4: B4 = B3 + [summations and functions evaluated for items 1-4 at ability B3]
Ability after problem 5: B5 = B4 + [summations and functions evaluated for items 1-5 at ability B4]
And so on.
Just reading papers on this, this is the gist of what the algorithm should be doing. But there are so many different ways to do this. The behaviour of my code is clearly wrong as I get division by zero errors... so this is where I get lost. I've messed with information functions and done derivatives, but my college level math is not cutting it.
Can someone explain to me how to do this part? The literature I've read is short on examples and the descriptions of the math appears incomplete to me. I suppose I'm asking for how to do this with a 3PL model that assumes that c is always zero and a is always 1.7 (or maybe -1.7-- whatever works.) I was trying to get to 1PL somehow anyway.
Edit: A visual guide to item response theory is the best explanation of how to do this I've seen so far, but the text gets confusing at the most critical point. I'm closer to getting this, but I'm still not understanding something. Also... the pattern of summations and functions isn't in this text like I expected.
How to do this:
This is an inefficient solution, but it works and is reasonably inituitive.
The last link I mentioned in the edit explains this.
Given a probability function, set of question difficulties, and corresponding set of evaluations-- ie, whether or not they got it correct.
With that, I can get a series of functions that will tell you the chance of their giving that exact response. Now... multiply all of those functions together.
We now have a big mess! But it's a single function in terms of the unknown ability variable that we want to find.
Next... run a slew of numbers through this function. Whatever returns the maximum value is the test taker's ability level. This can be used to either determine the standard error or to pick the next question for computer adaptive testing.

Oh no Another BigO one

I've been doing BigO recently, and I get the formula ok, but I've written a piece of code that takes and input and returns a time taken to complete a sort. So I have the input and time, how do I use this to classify what sort of BigO it is? I've made graphs and can see which sort they are but I can't do it using the formula? I'm not strong on maths which I think is my problem here!
For instance I get:
Size Time Operations
200 2 163648
400 1 162240
800 15 2489456
1600 6 10247376
3200 19 40858160
6400 79 165383984
12800 318 656588080
25600 1274 2624318128
51200 5059 10476803408
102400 20333 41969291968
I know that this is O(n^2) by looking at the graph and comparing, but how do I prove it?
Yes, you can sample a thousand different input sizes, and then try to derive a Big-O value from that, but you shouldn't - not only because it doesn't actually prove anything, but because that isn't the point.
The way to prove O(n^2) is to prove it on the code itself, not through experiments. The actual running time isn't important, because Big-O notation doesn't say anything about that - in simple terms, it only specifies the dominant term of whatever formula you would use to calculate the exact running time, in the sense of the number of operations executed for that function. Constants are thrown away, and so are smaller terms - the actual running time of a function might be 1000n^2+1000000n, but that's still O(n^2).
You can't mathematically prove anything from this table; the complexity might be O(1) if Time remains at 20333 for all larger values.
The best you can do is try fitting several curves to this table and selecting the best fit according to Occam's razor.
You can't prove it by looking at the timings, you can only prove it by analysing the code to see how many steps are performed. The reason for this is that the time taken is a function not only of your program but many other things outside of your control as well.
For example, who can say whether your machine didn't spend an inordinate amount of time in other processes during one particular test run of your program? This sort of thing can be minimsed to a point using statistical methods but the proof requires solid data.
What you can do is to look at some of your data points to get support for the contention that it's O(n2). Have a look at the last four entries:
Input Time
128 318
256 1274 1274 / 318 = 4.006
512 5059 5059 / 1274 = 3.971
1024 20333 20333 / 5059 = 4.019
You can see that each doubling of the input size has a multiplier effect of the time of about 4 which would tend to indicate an O(n2) property.
But this is support only. It applies only to that particular range of input values and, as stated, is subject to factors outside your control. Note also that the support would be harder to see if the time taken was not a simple one. For example, if the time function was t = n2/10 + 123n + 123456789, it would be a little harder to figure out.
Just by making a comparison between the values may not make any sense.However,if you plot a graph using this values( x-axis : input , y-axis:time),you will get a curve or a linear shape or whatever.Using this information,you can predict the BigO value of that function.Of course there may be(not always) some interrupts that affects the running of that process,but that does not last during the whole period.It is slight overhead that cannot affect the result.
In order to predict the BigO value , you will need some Calculus knowledge in order to make the analogy between the shape and BigO result.
For example,let's say that you got a linear shape and you know that it means O(n).In that point,you reached that result because you know the shape of a linear function graph and your graph looks like it.In order to reach the true proof , you have to draw both your functions curve and the graph of the mathematical function that has the closest shape to your graph.
There are some other functions like Big-Theta , Small-Omega that binds your function from upper or from lower.The mathematical function could be both of them,but as a result,your Big-O function is the closest one to that shape.

Digging into R profiling information

I am trying to optimize a bit of code, and am puzzled about information from summaryRprof(). In particular, it looks like a number of calls are made to external C programs, but I'm not able to pin down which C program, from which R function. I am planning to resolve this through a bunch of slicing and dicing of the code, but wondered if I am overlooking some better way to interpret the profiling data.
The highest-consuming function is .Call, which is apparently a generic description for calls to C code; the next leading functions appear to be assignment operations:
$by.self
self.time self.pct total.time total.pct
".Call" 2281.0 54.40 2312.0 55.14
"[.data.frame" 145.0 3.46 218.5 5.21
"initialize" 123.5 2.95 217.5 5.19
"$<-.data.frame" 121.5 2.90 121.5 2.90
"as.vector" 110.5 2.64 416.0 9.92
I decided to focus on the .Call to see how this arises. I looked through the profiling file to find those entries with .Call in the call stack, and the following are the top entries in the call stack (by count of # of appearances):
13640 "eval"
11252 "["
7044 "standardGeneric"
4691 "<Anonymous>"
4658 "tryCatch"
4654 "tryCatchList"
4652 "tryCatchOne"
4648 "doTryCatch"
This list is as clear as mud: I have <Anonymous> and standardGeneric in there.
I believe this is due to calls to functions in the Matrix package, but that's because I'm looking at the code and that package appears to be the only possible source of C code. However, a lot of different functions from Matrix are called in this package, and it seems very difficult to determine which function is consuming this time.
So, my question is pretty basic: is there some way of deciphering and attributing these calls (e.g. .Call, <Anonymous>, etc.) in some other way? The plot of the call graph for this code is rather tricky to render, given the # of functions involved.
The fallback tactics I see are to either (1) comment out bits of code (and hack around to make the code work with this) to see where the time consumption occurs, or to (2) wrap certain operations inside of other functions and see when those functions appear on the call stack. The latter is inelegant, but it seems like it's the best way to add a tag to the call stack. The former is unpleasant because it takes quite some time to run the code, and iteratively uncommenting code and rerunning is unpleasant.
May I suggest you use the profr package. This is another bit of Hadley magic. It's a wrapper around Rprof and gives a visulation of the call stack and timings.
I find profr very easy to use and interpret. For example, here is a profile of a bit of ddply example code and the resulting profr plot:
library(profr)
p <- profr(
ddply(baseball, .(year), "nrow"),
0.01
)
plot(p)
You can immediately see the following:
How ddply calls ldply, llply and loop_apply.
Inside loop_apply there is a .Call function.
You can confirm this by reading the source code for loop_apply:
> plyr:::loop_apply
function (n, f, env = parent.frame())
{
.Call("loop_apply", as.integer(n), f, env)
}
<environment: namespace:plyr>
Edit. There is something very odd about the ggplot.profr method. I have proposed the following fix to Hadley. (You may wish to try this on your example.)
ggplot.profr <- function (data, ..., minlabel = 0.1, angle = 0){
if (!require("ggplot2", quiet = TRUE))
stop("Please install ggplot2 to use this plotting method")
data$range <- diff(range(data$time))
ggplot(as.data.frame(data), aes(y=level)) +
geom_rect(
#aes(xmin=(level), xmax=factor(level)+1, ymin=start, ymax=end),
aes(ymin=level-0.5, ymax=level+0.5, xmin=start, xmax=end),
#position = "identity", stat = "identity", width = 1,
fill = "grey95",
colour = "black", size = 0.5) +
geom_text(aes(label = f, x = start + range/60),
data = subset(data, time > max(time) * minlabel), size = 4, angle = angle, vjust=0.5, hjust = 0) +
scale_x_continuous("time") +
scale_y_continuous("level")
}
It seems that the short answer is "No" and the long answer is "Yes, but you're not going to enjoy this." Even answering this question is going to take some time (so stick around, I may be updating it).
There are several basic things to get one's head around when working with profiling in R:
First, there are many different ways to think about profiling. It is quite typical to think in terms of a call stack. At any given instant, this is the sequence of function calls that are active, essentially nested within each other (subroutines, if you will). This is quite useful for understanding the state of evaluations, where functions will return, and lots of other things that are important for seeing things as the computer / interpreter / OS may see them. Rprof does call stack profiling.
Second, a different perspective is that I've got a bunch of code and a particular call is taking a long time: which line in my code caused that call to be made? This is line profiling. R doesn't have line profiling, as far as I can tell. This is in contrast with Python and Matlab, which both have line profilers.
Third, the map from from lines to calls is surjective, but it is not bijective: given a particular call stack we cannot guarantee that we can map it back to the code. In fact, call stack analyses often summarize the calls completely out of context of the whole stack (i.e. cumulative times are reported no matter where that call was on all of the different stacks in which it occurred).
Fourth, even though we have these constraints, we can put on our statistical hats and analyze the call stack data carefully and see what we can make of it. The call stack information is data and we like data, don't we? :)
Just a quick intro to a call stack. Let's just assume that our call stack looked like this:
"C" "B" "A"
This means that function A called B which then calls C (the order is reversed), and the call stack is 3 levels deep. In my code, the call stack gets to as many as 41 levels deep. Since the stacks can be so deep and are presented in reverse order, this is more interpretable by software than by a human. Naturally, we begin cleaning and transforming this data. :)
Now, our data really comes along looking like:
".Call" "subCsp_cols" "[" "standardGeneric" "[" "eval" "eval" "callGeneric"
"[" "standardGeneric" "[" "myFunc2" "myFunc1" "eval" "eval" "doTryCatch"
"tryCatchOne" "tryCatchList" "tryCatch" "FUN" "lapply" "mclapply"
"<Anonymous>" "%dopar%"
Miserable, isn't it? It even has duplicates of things like eval, some guy called <Anonymous> - probably some darn hacker. (Anonymous is legion, by the way. :-))
The first step in transforming this into something useful was to split each line of Rprof() output and reverse the entries (via strsplit and rev). The first 12 entries (last 12 if you look at the raw call stack, rather than the post-rev version) were the same for every line (of which there were about 12000, the sampling interval was 0.5 seconds - so about 100 minutes of profiling), and these can be discarded.
Remember, we're still interested in knowing which line(s) led to .Call, which took so much time. Before we get to that question, we put on our statistical caps: the profiling reports, e.g. from summaryRprof, profr, ggplot, etc., only reflect the cumulative time spent for a given call or for calls beneath a given call. What does this cumulative information not tell us? Bingo: whether that call was made many times, or a few, and whether the time spent was constant over all invocations of that call or whether there are some outliers. A particular function might be executed 100 times or 100K times, but all of the cost may come from a single invocation (it shouldn't, but we don't know until we look at the data).
This only begins to describe the fun. The A->B->C example doesn't reflect the way things may really appear, such as A->B->C->D->B->E. Now, "B" may be counted a couple of times. What's more, suppose that a lot of time is spent in the C level, but we never sample at precisely that level, only seeing its child calls in the stack. We may see a sizable time for "total.time", but not for "self.time". If there are lots of different child calls under C, we may lose sight of what to optimize - should we take out C altogether or tweak the children, B, D, and E?
Just to account for the time spent, I took the sequences and ran them through digest, storing counts for the digested values, via hash. I also split up the sequences, storing {(A),(A,B), (A,B,C), etc.}. This doesn't seem so interesting, but removing singletons from the counts helps a lot in cleaning up the data. We can also store the time spent in each call by using rle(). This is useful for analyzing the distribution of time spent for a given call.
Still we're nowhere closer to finding the actual time spent per line of code. We'll never get lines of code from the call stack. A simpler way to do this is to store a list of times throughout the code, which stores the output of proc.time(), for a given invocation. Taking the difference of these times reveals which lines or sections of code are taking a long time. (Hint: that's what we're really looking for, not the actual calls.)
However, we have this call stack and we might as well do something useful. Going up the stack is somewhat interesting, but if we rewind the profile information to a little earlier, we can find which calls tend to precede the longer running calls. This allows us to look for landmarks in the call stack - positions where we can tie a call to a particular line of code. This makes it a bit easier to map more calls back to code, if all we have is the call stack, rather than instrumented code. (As I keep mentioning: out of context, there isn't a 1:1 mapping, but at a fine enough granularity, especially in repeatedly hit calls that are distinctive, you may be able to find landmarks in the calls that map to code.)
Altogether, I was able to find which calls were taking a lot of time, whether that was based on 1 long interval or many small ones, what the distribution of time spent was like, and, with some effort, I was able to map the most important & time consuming calls back to the code and discover which parts of the code could benefit the most from rewriting or from a change in algorithms.
Statistical analyses of the call stack is loads of fun, but investigating a particular call based on cumulative time consumption is not a very good way to go. The cumulative time consumed by a call is informative on a relative basis, but it doesn't enlighten us as whether one or many calls consumed this time, nor the depth of the call in the stack, nor the section of code responsible for the invocations. The first two things can be addressed via a bit more R code, while the latter is best pursued through instrumented code.
As R doesn't yet have line profilers like Python and Matlab, the simplest way to handle this is to just instrument one's code.
A line in a profile file might look like
"strsplit" ".parseTabix" ".readVcf" "readVcf" "standardGeneric" "readVcf" "system.time"
which says, reading right to left, that the outermost function was system.time, which invoked readVcf, which was an S4 generic that dispatched to a readVcf method, invoking a function .readVcf, which invoked .parseTabix, which finally called strsplit.
Here we read in the profile file, sort the lines, tally them up (using rle -- run length encoding), then select the six most common paths in the profile file
r = rle(sort(readLines("readVcf.Rprof"))
o = order(r$lengths, decreasing=TRUE)
r$values[head(o)]
This
r$lengths[head(o)]
tells us how many times each of those call stacks were sampled.
There are some common patterns that can help interpret this. Here's an S4 generic being dispatched to its method
"readVcf" "standardGeneric" "readVcf"
an lapply iterating over its function
"FUN" "lapply"
and a tryCatch surrounding a .Call
".Call" "doTryCatch" "tryCatchOne" "tryCatchList" "tryCatch"
Usually one tries to profile relatively small chunks of code, rather than a whole script, with the small chunk identified by, e.g., stepping through the code interactively or making some educated guesses about what parts are likely to be slow. The fact that .Call is the most commonly sampled function is not encouraging -- it suggests most of the time is already being spent in C. Likely your best bet will involve coming up with a better overall algorithm, rather than say a brute force approach.

Getting more info from Rprof()

I've been trying to dig into what the time-hogs are in some R code I've written, so I'm using Rprof. The output isn't yet very helpful though:
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"$<-.data.frame" 2.38 23.2 2.38 23.2
"FUN" 2.04 19.9 10.20 99.6
"[.data.frame" 1.74 17.0 5.54 54.1
"[.factor" 1.42 13.9 2.90 28.3
...
Is there some way to dig deeper and find out which specific invocations of $<-.data.frame, and FUN (which is probably from by()), etc. are actually the culprits? Or will I need to refactor the code and make smaller functional chunks in order to get more fine-grained results?
The only reason I'm resisting refactoring is that I'd have to pass data structures into the functions, and all the passing is by value, so that seems like a step in the wrong direction.
Thanks.
The existing CRAN package profr and proftools are useful for this. The latter can use Rgraphviz which isn't always installable.
The R Wiki page on profiling has additional info and a nice script by Romain which can also visualize (but requires graphviz).
Rprof takes samples of the call stack at intervals of time - that's the good news.
What I would do is get access to the raw stack samples (stackshots) that it collects, and pick several at random and examine them. What I'm looking for is call sites (not just functions, but the places where one function calls another) that appear on multiple samples. For example, if a call site appears on 50% of samples, then that's what it costs, because its possible removal would save roughly 50% of total time. (Seems obvious, right? But it's not well known.)
Not every costly call site is optimizable, but some are, unless the program is already as fast as possible.
(Don't be distracted by issues like how many samples you need to look at. If something is going to save you a reasonable fraction of time, then it appears on a similar fraction of samples. The exact number doesn't matter. What matters is that you find it. Also don't be distracted by graph and recursion and time measurement and counting issues. What matters is, for each call site you see, the fraction of stack samples that show it.)
Parsing the output that Rprof generates isn't too hard, and then you get access to absolutely everything.

Resources