The parameters "atol" and "rtol" default is 0, in sklearn.neighbors.KernelDensity class. What does this mean?
Does it mean it uses all the data points to calculate the likelihood?
What will happen when they are not set to 0?
You can check the documentation of sklearn.
atol : float, default=0
The desired absolute tolerance of the result. A larger tolerance will generally lead to faster execution.
rtol : float, default=0
The desired relative tolerance of the result. A larger tolerance will generally lead to faster execution.
Intuitively, it means when sklearn calculating the kernel density, the program might stop earlier before finding optimal mean square error. It will allow some range of error, but faster. It's a balance of time and accuracy. You can try what range of atol/rtol you can accept when you are on the developing stage so you don't need to wait so long when testing the code.
Related
I'm trying to assess the expected performance of calculating trigonometry functions as a function of the required precision. Obviously the wall clock time depends on the speed of the underlying arithmetic, so factoring that out by just counting number of operations:
Using state-of-the-art algorithms, how many arithmetic operations (add, subtract, multiply, divide) should it take to calculate sin(x), as a function of the number of bits (or decimal digits) of precision required in the output?
... to assess the expected performance of calculating trigonometry functions as a function of the required precision.
Look as the first omitted term in the Taylor series sine for x = π/4 as the order of error.
Details: sin(x) usually has these phases:
Handling special cases: NaN, infinities.
Argument reduction to the primary range to say [-π/4...+π/4]. Real good reduction is hard as π is irrational and so involves code that reaches 50% of sin() time. Much time used to emulate the needed extended precision. (Research K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit")
Low quality reduction involves much less:/, truncate, -, *.
Calculation over a limited range. This is what many only consider. If done with a Taylor's series and needing 53 bits, then about 10-11 terms are needed: Taylor series sine. Yet quality code often uses a pair of crafted polynomials, each of about 4-5 terms, to form the quotient p(x)/q(x).
Of course dedicated hardware support in any of these steps greatly increases performance.
Note: code for sin() is often paired with cos() code as extensive use of trig identities simplify the calculation.
I'd expect a software solution for sin() to cost on the order of 25x a common *. This is a rough estimate.
To achieve a very low error rate in the ULP, code typically uses a tad more. sine_crap() could get by with only a few terms. So when assessing time performance, there is a trade-off with correctness. How good a sin() do you want?
assess the expected performance of calculating trigonometry functions as a function of the required precision
Using the Taylors series as a predictor of the number of ops, worst case x = π/4 (45°) and the error in the calculation on the order of the last term of the series:
For 32-bit float, order 6 float ops needed.
For 64-bit double, order 9 float ops needed.
So if time scales by the square of the FP width, double predicted to take 9/6*2*2 or 6 times as long.
We can calculate any trigonometric function using a simple right angled triangle or using the McLaurin\Taylor Series. So it really depends on which one you choose to implement. If you only pass an angle as an argument, and wish to calculate the sin of that particular angle, it would take about 4 to 6 steps to calculate the sin using an unit circle.
I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN.
Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf.
But for some reason this is not the case in IEEE-754. There must be a reason for this, maybe some optimization or compatibility reasons.
So what's the point?
It's a nonsense from the mathematical perspective.
Yes. No. Sort of.
The thing is: Floating-point numbers are approximations. You want to use a wide range of exponents and a limited number of digits and get results which are not completely wrong. :)
The idea behind IEEE-754 is that every operation could trigger "traps" which indicate possible problems. They are
Illegal (senseless operation like sqrt of negative number)
Overflow (too big)
Underflow (too small)
Division by zero (The thing you do not like)
Inexact (This operation may give you wrong results because you are losing precision)
Now many people like scientists and engineers do not want to be bothered with writing trap routines. So Kahan, the inventor of IEEE-754, decided that every operation should also return a sensible default value if no trap routines exist.
They are
NaN for illegal values
signed infinities for Overflow
signed zeroes for Underflow
NaN for indeterminate results (0/0) and infinities for (x/0 x != 0)
normal operation result for Inexact
The thing is that in 99% of all cases zeroes are caused by underflow and therefore in 99%
of all times Infinity is "correct" even if wrong from a mathematical perspective.
I'm not sure why you would believe this to be nonsense.
The simplistic definition of a / b, at least for non-zero b, is the unique number of bs that has to be subtracted from a before you get to zero.
Expanding that to the case where b can be zero, the number that has to be subtracted from any non-zero number to get to zero is indeed infinite, because you'll never get to zero.
Another way to look at it is to talk in terms of limits. As a positive number n approaches zero, the expression 1 / n approaches "infinity". You'll notice I've quoted that word because I'm a firm believer in not propagating the delusion that infinity is actually a concrete number :-)
NaN is reserved for situations where the number cannot be represented (even approximately) by any other value (including the infinities), it is considered distinct from all those other values.
For example, 0 / 0 (using our simplistic definition above) can have any amount of bs subtracted from a to reach 0. Hence the result is indeterminate - it could be 1, 7, 42, 3.14159 or any other value.
Similarly things like the square root of a negative number, which has no value in the real plane used by IEEE754 (you have to go to the complex plane for that), cannot be represented.
In mathematics, division by zero is undefined because zero has no sign, therefore two results are equally possible, and exclusive: negative infinity or positive infinity (but not both).
In (most) computing, 0.0 has a sign. Therefore we know what direction we are approaching from, and what sign infinity would have. This is especially true when 0.0 represents a non-zero value too small to be expressed by the system, as it frequently the case.
The only time NaN would be appropriate is if the system knows with certainty that the denominator is truly, exactly zero. And it can't unless there is a special way to designate that, which would add overhead.
NOTE:
I re-wrote this following a valuable comment from #Cubic.
I think the correct answer to this has to come from calculus and the notion of limits. Consider the limit of f(x)/g(x) as x->0 under the assumption that g(0) == 0. There are two broad cases that are interesting here:
If f(0) != 0, then the limit as x->0 is either plus or minus infinity, or it's undefined. If g(x) takes both signs in the neighborhood of x==0, then the limit is undefined (left and right limits don't agree). If g(x) has only one sign near 0, however, the limit will be defined and be either positive or negative infinity. More on this later.
If f(0) == 0 as well, then the limit can be anything, including positive infinity, negative infinity, a finite number, or undefined.
In the second case, generally speaking, you cannot say anything at all. Arguably, in the second case NaN is the only viable answer.
Now in the first case, why choose one particular sign when either is possible or it might be undefined? As a practical matter, it gives you more flexibility in cases where you do know something about the sign of the denominator, at relatively little cost in the cases where you don't. You may have a formula, for example, where you know analytically that g(x) >= 0 for all x, say, for example, g(x) = x*x. In that case the limit is defined and it's infinity with sign equal to the sign of f(0). You might want to take advantage of that as a convenience in your code. In other cases, where you don't know anything about the sign of g, you cannot generally take advantage of it, but the cost here is just that you need to trap for a few extra cases - positive and negative infinity - in addition to NaN if you want to fully error check your code. There is some price there, but it's not large compared to the flexibility gained in other cases.
Why worry about general functions when the question was about "simple division"? One common reason is that if you're computing your numerator and denominator through other arithmetic operations, you accumulate round-off errors. The presence of those errors can be abstracted into the general formula format shown above. For example f(x) = x + e, where x is the analytically correct, exact answer, e represents the error from round-off, and f(x) is the floating point number that you actually have on the machine at execution.
I can see that my design variable exceeds its limits. (using COBYLA in this case)
I have a sample setup with single design variable where the optimum lies around 0.
I set the 'lower=0'.
I want this to be a very strict limit, because negative values yield NaN for my solver.
The optimizer goes i.e.
1, 2, 0, -0.125000000e-01, -1.56250000e-02, -1.95312500e-03, -2.44140625e-04
-3.05175781e-05, -3.81469727e-06, -5.00000000e-07
I am guessing this is optimizer type dependent? But is there a way enforce more strictly.
Unfortunately, COBYLA does not strictly respect variable bounds (see scipy docs) The best you can do is to add them as linear constraints, and it will attempt to enforce them at the optimum point.
You can try SLSQP, though. It does strictly respect the bounds.
I am writing web crawler scheduler and have run into problems. First I will describe how I'm trying to find optimal schedule for when my crawler is visiting the page and then I will present my problem.
Scheduler definition
Scheduler is based on this paper "Optimal crawling strategies for web search engines" by J.Wolf. The paper proposes that update times of web pages follow exponential distribution with parameter λ. The problem is finding optimal number of times xi, the page i will be crawled in time interval [0,T]. The function proposed is:
Because this function is convex and its input arguments xi is discrete this kind of problem can be solved using algorithm suggested by Fredrerickson and Johnson in "The Complexity of Selection and Ranking in X + Y and Matrices with sorted columns", that has time complexity O(max{N, log(R/N)}). The optimization algorithm solves the problem by finding N-th element in [RxN] matrix where element at position (i, j) is equal to derivation of j function with input argument x = i, where derivation dj(xi) is equal to:
Because function fi is convex that means that function di has property that is monotonically increasing (matrix has sorted columns).
Problems
I run into problems when evaluating derivation, because of rounding errors d(x+1) - d(x), did not have guarantee to be greater or equal to 0, and I'm not sure that values that I got from optimizer are optional values. Rounding errors happen because value of x can be only positive integers in range of 0 to few billions, therefor exponent in function f is either big negative number or extremely small number (-5000).
Failed Attempts
The first thing I tried, I downloaded arbitrary precision library. This solved my problem but the overhead of library is to big.
The second thing I tried was I expanded d and got function like:
and then I tried to compare dj(xi) and dk(xw) by comparing their terms individually and than try to deduce is dj is bigger or smaller or greater than dk. If I could compare derivation I could solve my problem because optimization algorithm does not need concrete values, instead it only need relations between values. I couldn't find the solution because the term w.
I also tried looking at log(dj(xi)) because log preserves function monotony, but log also had rounding errors and I couldn't compare log(dj) and log(dk) without computing the final values.
If anybody has any other solution that could potentially work I would be most graceful.
I am running clustering using the mclust function. In need to get the number of iterations the algorithm used to get the answer. I can't seem to find it anywhere. I do not mind using other function that will perform "gaussian mixture mode" using EM if it will provide me the number of iteration as part of it's output.
There seems to be no clear way to extract this, as far as I can tell. Here's a pretty hacky and approximate way to get at it though.
You can set the maximum number of iterations for EM using the control parameter,
x<-c(rnorm(100),rnorm(100,10,1))
mod<-Mclust(x,control = emControl(itmax=100))
These are set to Inf by default and so the EM will terminate only when the log likelihood changes in increments smaller than the tolerance. If you set itmax, the EM will terminate, but will throw a warning that the algorithm stopped before reaching the tolerance bounds.
So you could adjust itmax a few times to get a sense of how many iterations are required before the EM is naturally terminating. For example,
mod<-Mclust(x,control = emControl(itmax=102))
throws a warning but
mod<-Mclust(x,control = emControl(itmax=103))
does not.
So it seems that 103 iterations are required to reach the exit conditions (with the default tolerance parameters).