Confusion about precision-recall curve and average precision - information-retrieval

I'm reading a lot about Precision-Recall curves in order to evaluate my image retrieval system. In particular I'm reading this article about feature extractors in VLFeat and the wikipedia page about precision-recall.
I understand that this curve is useful to evaluate our system performance w.r.t. the number of elements retrieved. So we repeatedly compute precision-recall retrieving the top element, then top 2, top 3 and so on...but my question is: when do we stop?
My intuition is: we stop when our list of retrieved elements has recall equal to 1, so we retrieve all the relevant elements (i.e. there are no false negatives, only true positives).
Same question is for average precision: how many elements should be present in the retrieved result for computing it? If my previous intuition is correct, then we just need to find out what is the smallest list s.t. recall is 1 and use it for compute it AP.
I wonder why all the libraries for computing p-r curve don't show how this is implemented?

An information retrieval system with recall 1 means a perfect system which doesn't seem possible in practice! Precision-Recall curves are good when you need to compare two or more information retrieval systems. Its not about stopping when recall or precision reaches some value. Precision-Recall curve shows pairs of recall and precision values at each point (consider top 3 or 5 documents). You can draw the curve upto any reasonable point.
Curves close to the perfect Precision-Recall curve have a better performance level than the ones closes to the baseline. In other words, a curve above the other curve has a better performance level. Two Precision-Recall curves represent the performance levels of two IR systems: A and B. System A clearly outperforms system B according to the following figure.
Remember: Precision-Recall curve not only used for evaluating IR systems. It can be used to show how much good your classifier is! For example, you can compute precision, recall for a binary classification task and plot the Precision-Recall curve that can give you a good estimate of the performance of your classifier.
For example:
I would encourage you to see this tutorial from Coursera. I believe your idea will become more clear about Precision-Recall curve.

Related

How do I numerically compare the Dymos solution to the simulated solution?

I want to conduct a convergence study for my Dymos optimization results where I vary the number of nodes and compare the simulated solution to the optimization solution. From what I understand, Dymos fits polynomials to the system dynamics to represent the timeseries solution. What is the best way to compare the polynomial trajectory of the optimization solution to the trajectory of the simulated solution? I specifically want to evaluate the difference between the two trajectories away from the collocation/control nodes... to show that the polynomial fitting actually represents the simulated solution. How would I access the polynomial fitting data?
Thanks in advance.
For some of the testing we have an assert_timeseries_near_equal function that treats the more dense time series as the truth and tests that the less dense timeseries (usually the discrete solution) is reasonably close to it.
We're actually working on this method a bit more explicit right now so it's a little easier for users to apply in general cases, such as comparing discrete solutions from two different cases.
In general, there's a few different ways you can test your explicit results against an explicit integration. You could just verify that the final states of the two solutions are reasonably close. Since the error tends to increase over the course of the trajectory this is often good enough for a quick check. The downside of this approach is that it doesn't test that both solutions took the same path to the final condition.
To test the solution away from the nodes I'd recommend the following: Add a second timeseries output to the relevant phase that contains more segments or higher order segments. This timeseries will have more nodes. Dymos will interpolate from the solution's collocation grid onto this more dense timeseries output grid. Comparing this against the explicit simulation should still match exactly in terms of times, controls, and parameters, you'll better capture the interpolating state polynomials vs the explicitly simulated results.
There are other statistical methods out there for comparing timeseries that you can bring to bear, but visualizing the explicit trajectory plus some error bound as a "tube" into which we want to fit the discrete solution is usually how I handle it.

R - Approach to find outliers/artefacts in blood pressure curve

Do you guys have an idea how to approach the problem of finding artefacts/outliers in a blood pressure curve? My goal is to write a program, that finds out the start and end of each artefact. Here are some examples of different artefacts, the green area is the correct blood pressure curve and the red one is the artefact, that needs to be detected:
And this is an example of a whole blood pressure curve:
My first idea was to calculate the mean from the whole curve and many means in short intervals of the curve and then find out where it differs. But the blood pressure varies so much, that I don't think this could work, because it would find too many non existing "artefacts".
Thanks for your input!
EDIT: Here is some data for two example artefacts:
Artefact1
Artefact2
Without any data there is just the option to point you towards different methods.
First (without knowing your data, which is always a huge drawback), I would point you towards Markov switching models, which can be analysed using the HiddenMarkov-package, or the HMM-package. (Unfortunately the RHmm-package that the first link describes is no longer maintained)
You might find it worthwile to look into Twitter's outlier detection.
Furthermore, there are many blogposts that look into change point detection or regime changes. I find this R-bloggers blog post very helpful for a start. It refers to the CPM-package, which stands for "Sequential and Batch Change Detection Using Parametric and Nonparametric Methods", the BCP-package ("Bayesian Analysis of Change Point Problems"), and the ECP-package ("Non-Parametric Multiple Change-Point Analysis of Multivariate Data"). You probably want to look into the first two as you don't have multivariate data.
Does that help you getting started?
I could provide an graphical answer that does not use any statistical algorithm. From your data I observe that the "abnormal" sequences seem to present constant portions or, inversely, very high variations. Working on the derivative, and setting limits on this derivative could work. Here is a workaround:
require(forecast)
test=c(df2$BP)
test=ma(test, order=50)
test=test[complete.cases(test)]
which <- ma(0+abs(diff(test))>1, order=10)>0.1
abnormal=test; abnormal[!which]<-NA
plot(x=1:NROW(test), y=test, type='l')
lines(x=1:NROW(test), y=abnormal, col='red')
What it does: first "smooths" the data with a moving average to prevent the micro-variations to be detected. Then it applyes the "diff" function (derivative) and tests if it is greater than 1 (this value is to be adjusted manually depending on the soothing amplitude). THen, in order to get a whole "block" of abnormal sequence without tiny gaps, we apply again a smoothing on the boolean and test it superior to 0.1 to grasp better the boundaries of the zone. Eventually, I overplot the spotted portions in red.
This works for one type of abnormality. For the other type, you could make up a low treshold on the derivative, inversely, and play with the tuning parameters of smoothing.

What is the purpose in this part of the Monte Carlo path tracing algorithm?

In all of the simple algorithms for path tracing using lots of monte carlo samples the tracing the path part of the algorithm randomly chooses between returning with the emitted value for the current surface and continuing by tracing another ray from that surface's hemisphere (for example in the slides here). Like so:
TracePath(p, d) returns (r,g,b) [and calls itself recursively]:
Trace ray (p, d) to find nearest intersection p’
Select with probability (say) 50%:
Emitted:
return 2 * (Le_red, Le_green, Le_blue) // 2 = 1/(50%)
Reflected:
generate ray in random direction d’
return 2 * fr(d ->d’) * (n dot d’) * TracePath(p’, d’)
Is this just a way of using russian roulette to terminate a path while remaining unbiased? Surely it would make more sense to count the emissive and reflective properties for all ray paths together and use russian roulette just to decide whether to continue tracing or not.
And here's a follow up question: why do some of these algorithms I'm seeing (like in the book 'Physically Based Rendering Techniques') only compute emission once, instead of taking in to account all the emissive properties on an object? The rendering equation is basically
L_o = L_e + integral of (light exiting other surfaces in to the hemisphere of this surface)
which seems like it counts the emissive properties in both this L_o and the integral of all the other L_o's, so the algorithms should follow.
In reality, the single emission vs. reflection calculation is a bit too simplistic. To answer the first question, the coin-flip is used to terminate the ray but it leads to much greater biases. The second question is a bit more complex....
In the abstract of Shirley, Wang and Zimmerman TOG 94, the authors briefly summarize the benefits and complexities of Monte Carlo sampling:
In a distribution ray tracer, the crucial part of the direct lighting
calculation is the sampling strategy for shadow ray testing. Monte
Carlo integration with importance sampling is used to carry out this
calculation. Importance sampling involves the design of
integrand-specific probability density functions which are used to
generate sample points for the numerical quadrature. Probability
density functions are presented that aid in the direct lighting
calculation from luminaires of various simple shapes. A method for
defining a probability density function over a set of luminaires is
presented that allows the direct lighting calculation to be carried
out with one sample, regardless of the number of luminaires.
If we start dissecting that abstract, here are some of the important points:
Lights aren't points: in real life, we're almost never dealing with a point light source (e.g., a single LED).
Shadows are usually soft: this is a consequence of the non-point lights. It's very rare to see a truly hard-edged shadow in real life.
Noise (especially bright sampling artifacts) are disproportionately distracting: humans have a lot of intuition about how things should look. Look at slide 5 (the glass sphere on a table) in the OP's linked presentation. Note the bright specks in the shadow.
When rendering for more visual realism, both of the sets of reflected visibility rays and lighting calculation rays must be sampled and weighted according to the surface's bidirectional reflectance distribution function.
Note that this is a guided sampling method that's distinctly different from the original question's "generate ray in random direction" method in that it is both:
More accurate: the images in the linked PDF suffer a bit from the PDF process. Figure 10 is a reasonable representation of the original - note that lack of bright speckle artifacts that you will sometimes see (as in figure 5 of the original presentation).
Significantly faster: as the original presentation notes, unguided Monte Carlo sampling can take quite a while to converge. More sampling rays = much more computation = more time.
After reading the slides (thank you for posting), I'll amend my answer as best I can.
Is this just a way of using russian roulette to terminate a path
while remaining unbiased? Surely it would make more sense to count
the emissive and reflective properties for all ray paths together
and use russian roulette just to decide whether to continue tracing
or not.
Perhaps the emitted and reflected properties are treated differently because the reflected path depends on the incident path in a way that emitted paths do not (at least for a spectral surface). Does the algorithm take a Bayesian approach and use prior information about the incidence angle as a prior for predicting the reflective angle? Or is this a Feynman integration over all paths to come up with a probability? It's hard to tell without digging deeper into the details of the theory.
My earlier black body comment is quite incorrect. I see that the slides talk about (R, G, B) components; black body emissivities are integrated over all wavelengths.
And here's a follow up question: why do some of these algorithms I'm
seeing (like in the book 'Physically Based Rendering Techniques')
only compute emission once, instead of taking in to account all the
emissive properties on an object? The rendering equation is
basically
L_o = L_e + integral of (light exiting other surfaces in to the
hemisphere of this surface)
A single emissivity for the surface would assume that there's no functional relationship on wavelength or direction. I don't know how significant it is for rendering photo-realistic images.
The ones that are posted are certainly impressive. I wonder how different they would look if the complexities that you have in mind were included?
Thank you for posting a nice question - I'm voting it up. It's been a long time since I've thought about this kind of problem. I wish I could be more helpful.
Yes that is a very basic implementation of Russian Roulette, though normally the probability of terminating would take into account the light intensity (i.e. less light means the value contributes less to the final summation so use a higher probability of terminating).

Filtering methods of the complex oscilliations

If I have a system of a springs, not one, but for example 3 degree of freedom system of the springs connected in some with each other. I can make a system of differential equations for but it is impossible to solve it in a general way. The question is, are there any papers or methods for filtering such a complex oscilliations, in order to get rid of the oscilliations and get a real signal as much as possible? For example if I connect 3 springs in some way, and push them to start the vibrations, or put some weight on them, and then take the vibrations from each spring, are there any filtering methods to make it easy to determine the weight (in case if some mass is put above) of each mass? I am interested in filtering complex spring like systems.
Three springs, six degrees of freedom? This is a trivial solution using finite element methods and numerical integration. It's a system of six coupled ODEs. You can apply any form of numerical integration, such as 5th order Runge-Kutta.
I'd recommend doing an eigenvalue analysis of the system first to find out something about its frequency characteristics and normal modes. I'd also do an FFT of the dynamic forces you apply to the system. You don't mention any damping, so if you happen to excite your system at a natural frequency that's close to a resonance you might have some interesting behavior.
If the dynamic equation has this general form (sorry, I don't have LaTeX here to make it look nice):
Ma + Kx = F
where M is the mass matrix (diagonal), a is the acceleration (2nd derivative of displacements w.r.t. time), K is the stiffness matrix, and F is the forcing function.
If you're saying you know the response, you'll have to pre-multiply by the transpose of the response function and try to solve for M. It's diagonal, so you have a shot at it.
Are you connecting the springs in such a way that the behavior of the system is approximately linear? (e.g. at least as close to linear as are musical instrument springs/strings?) Is this behavior consistant over time? (e.g. the springs don't melt or break.) If so, LTI (linear time invariant) systems theory might be applicable. Given enough measurements versus the numbers of degrees of freedom in the LTI system, one might be able to estimate a pole-zero plot of the system response, and go from there. Or something like a linear predictor might be useful.
Actually it is possible to solve the resulting system of differential equations as long as you know the masses, etc.
The standard approach is to use a Laplace Transform. In particular you start with a set of linear differential equations. Add variables until you have a set of first order linear differential equations. (So if you have y'' in your equation, you'd add the equation z = y' and replace y'' with z'.) Rewrite this in the form:
v' = Av + w
where v is a vector of variable, A is a matrix, and w is a scalar vector. (An example of something that winds up in w is gravity.)
Now apply a Laplace transform to get
s L(v) - v(0) = AL(v) + s w
Solve it to get
L(v) = inv(A - I s)(s w + v(0))
where inv inverts a matrix and I is the identity matrix. Apply the inverse Laplace transform (if you read up on Laplace transforms you can find tables of inverse of common types of functions - getting a complete list of the functions you actually encounter shouldn't be that hard), and you have your solution. (Be warned, these computations quickly get very complex.)
Now you have the ability to take a particular setup and solve for the future behavior. You also have the ability to (if you do things really carefully) figure out how the model responds to a small perturbation in parameters. But your problem is that you don't know the parameters to use. However you do have the ability to measure the positions in the system at repeated times.
If you put this together, what you can do is this. Measure your position at a number of points. First estimate all of the initial values of the parameters, and then all of the values a second later. You can adjust your parameters (using Newton's method) to come close enough to the values a second later. Take the measurements from 5 seconds later and use that initial estimate as your starting point to refine your calculations for what is happening 5 seconds later. Repeat with longer intervals to get all of your answers.
Writing and debugging this should take you some time. :-) I would strongly recommend investigating how much of this Mathematica knows how to do for you already...

Calculus, How can you find an equation from a series of numbers?

I'm analyzing financial data and would like to find the inflection points of a line. I know I can do this using derivatives, but first I need an equation. Is there a way to generate an equation based off of a series of numbers. I would need to do this programmaticly.
Spline interpolation is probably more useful for you than polynomial interpolation: if you fit a polynomial, it must inevitably head off to +/- infinity outside your data range.
You will also want a method which allows a slightly loose fit: financial data is often a bit noisy which can result in very weird curves if you try to fit it exactly.
There are established procedures for turning a set of existing data points into a polynomial; this is called Polynomial Interpolation. This article in Wikipedia: http://en.wikipedia.org/wiki/Polynomial_interpolation
explains it mathematically. You can probably Google for algorithms easily enough.
Given enough points, your polynomial tracks the original, unknown function reasonably well, so the polynomial's inflection points should roughly coincide with the peaks and troughs of your data.
On the other hand, we all know there's not really a function behind financial data. So if I were you I'd scan along those points and find every point that has a smaller value to either side of it, and declare that a high; and vice versa for lows. Force-fitting this data into a fictitious function isn't going to make it any more useful.
Update: Tom Smith advises that spline interpolation is to be preferred to polynomial interpolation for this kind of thing, and Wikipedia bears him out. Or rather, it's bullish on his answer.
What you are thinking is analytical calculus ... when having discrete data (e.g. points), you have to do it numerically. Now, a line usually doesn't have inflection points, so I guess you're thinking of a curve. You can either interpolate some kind of it through the points, then calculate the first derivative (also numerically, but for a larger number of points), or you can just calculate the first derivation from the points you have (which will be better depends on how many points you actually have).
But really, this is just theory since we don't know the nature of data, or the language or anything.
For more on the subject search: numerical analysis on wiki, and go from there.
I think curve fitting might help you in this case. Here is a discussion which might be handy.
cheers

Resources