What is the error of trigonometric instructions on x86?

What is the error of trigonometric instructions on x86? - math

Where can I find the information about error ranges for trigonometric function instructions on x86 processors, like fsincos?

What you ask is rarely an interesting question, and most likely you really want to know something different. So let me answer different questions first:
How to calculate trigonometric function to a certain accuracy?
Just use a longer datatype. With x86, if you need the result with double accuracy, do an 80-bit extended double calculation and you are on the safe side.
How to get platform-independent accuracy?
You need a specialized software solution for this, like MPFR
That said, let me come back to your original question. Short answer: for small operands it should be typically within 1 ulp. For larger operands it's getting worse. The only way to find out for sure is to test this for yourself, like this guy did. There is no reliable information from the processor vendors.

For Intel CPUs the accuracy of the built-in transcendental instructions is documented in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, section 8.3.10 Transcendental Instruction Accuracy:
With the Pentium processor and later IA-32 processors, the worst case error on transcendental functions is less than 1 ulp when rounding to the nearest (even) and less than 1.5 ulps when rounding in other modes.
It should be noted that the error bound of 1 ulp applies to the 80-bit extended-precision format, as all transcendental function instructions deliver extended-precision results. The issue noted by Stephen Cannon in an earlier comment regarding a loss of accuracy, relative to a mathematical reference, for the trigonometric function instructions FSIN, FCOS, FSCINCOS, FPTAN, due to argument reduction with a 66-bit machine PI, is acknowledged by Intel. Guidance is provided as follows:
Regardless of the target precision (single, double, or double-extended), it is safe to reduce the argument to a value smaller in absolute value than about 3π/4 for FSIN, and smaller than about 3π /8 for FCOS, FSINCOS, and FPTAN. [...] For example, accuracy measurements show that the double-extended precision result of FSIN will not have errors larger than 0.72 ulp for |x| < 2.82 [...]
Likewise, the double-extended precision result of FCOS will not have errors larger than 0.82 ulp for |x| < 1.31 [...]
It is further acknowledged that the error bound of 1 ulp for the logarithmic function instructions FYL2X and FYL2XP1 only holds when y = 1 (this was not clear in some of Intel's older documentation):
The instructions FYL2X and FYL2XP1 are two operand instructions and are guaranteed to be within 1 ulp only when y equals 1. When y is not equal to 1, the maximum ulp error is always within 1.35
Using a multi-precision library, it is straightforward to put Intel's claims to a test. To collect the following data, I used Richard Brent's MP library as a reference, and ran 231 random test cases in the intervals indicated:
Intel Xeon CPU E3-1270 v2 "IvyBridge", Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
2xm1 [-1,1] max. ulp = 0.898306 at x = -1.8920e-001 (BFFC C1BED062 C071D472)
sin [-2.82,+2.82] max. ulp = 0.706783 at x = 5.1323e-001 (3FFE 8362D6B1 FC93DFA0)
cos [-1.41,+1.41] max. ulp = 0.821634 at x = -1.3201e+000 (BFFF A8F8486E 591A59D7)
tan [-1.41,+1.41] max. ulp = 0.990388 at x = 1.3179e+000 (3FFF A8B0CAB9 0039C790)
atan [-1,1] max. ulp = 0.747328 at x = 1.2252e-002 (3FF8 C8BB9E06 B9EB4DF8), y = 3.9204e-001 (3FFD C8B8DC94 AA6655B4)
y2lx [0.5,2.0] max. ulp = 0.994396 at x = 1.0218e+000 (3FFF 82C95B56 8A70EB2D), y = 1.0000e+000 (3FFF 80000000 00000000)
yl2x [1.0,1.2] max. ulp = 1.202769 at x = 1.0915e+000 (3FFF 8BB70F1B C5F7E103), y = -9.8934e-001 (BFFE FD453A23 AC926478)
yl2xp1 [-0.7,1.44] max. ulp = 0.990469 at x = 2.1709e-002 (3FF9 B1D61A98 BF349080), y = 1.0000e+000 (3FFF 80000000 00000000)
yl2xp1 [-1, 1] max. ulp = 1.206979 at x = 9.1169e-002 (3FFB BAB69127 C1D5C158), y = -9.9281e-001 (BFFE FE28A91F 132F0C35)
While such non-exhaustive testing cannot prove error bounds, the maximum errors found appear to confirm Intel's documentation.
I do not have any modern AMD processors to test, but do have test data for an old 32-bit Athlon CPU. Full disclosure: I designed the algorithms for the transcendental functions instructions used in 32-bit Athlon processors. My accuracy target was less than 1 ulp for all the instructions; however the same caveat about argument reduction by 66-bit machine PI for trigonometric functions already mentioned above applies.
Athlon XP-2100 "Palomino", x86 Family 6 Model 6 Stepping 2, AuthenticAMD
2xm1 [-1,1] max. ulp = 0.720006 at x = 5.6271e-001 (3FFE 900D9E90 A533535D)
sin [-2.82, +2.82] max. ulp = 0.663069 at x = -2.8200e+000 (C000 B47A7BB2 305631FE)
cos [-1.41, +1.41] max. ulp = 0.671089 at x = -1.3189e+000 (BFFF A8D0CF9E DC0BCA43)
tan [-1.41, +1.41] max. ulp = 0.783821 at x = -1.3225e+000 (BFFF A947067E E3F4C39C)
atan [-1,1] max. ulp = 0.665893 at x = 5.5333e-001 (3FFE 8DA6B606 C58B206A) y = 5.5169e-001 (3FFE 8D3B9DC8 5EA87546)
yl2x [0.4,2.5] max. ulp = 0.716276 at x = 6.9826e-001 (3FFE B2C128C3 0EF1EC00) y = -1.2062e-001 (BFFB F7064049 BC362838)
yl2xp1 [-1,4] max. ulp = 0.691403 at x = 1.9090e-001 (3FFC C37C0397 F8184934) y = -2.4796e-001 (BFFC FDE93CA9 980BF78C)
The AMD64 Architecture Programmer’s Manual, Vol. 1, in section 6.4.5.1 Accuracy of Transcendental Results, documents the error bounds as follows:
x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.

You can read the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 1 section 8.3.10 on Transcendental Instruction Accuracy. There is a precise formula, but also the more accessible statement
With the Pentium processor and later IA-32 processors, the worst case error on transcendental functions is less than 1 ulp when rounding to the nearest (even) and less than 1.5 ulps when rounding in other modes.

Related

Handling box constraints in Nelder-Mead optimisation by distorting the parameter space

I have a question on a specific implementation of a Nelder-Mead algorithm (1) that handles box contraints in an unusual way. I cannot find in anything about it in any paper (25 papers), textbook (searched 4 of them) or the internet.
I have a typical optimisation problem: min f(x) with a box constraint -0.25 <= x_i <= 250
The expected approach would be using a penalty function and make sure that all instances of f(x) are "unattractive" when x is out of bounds.
The algorithm works differently: the implementation in question does not touch f(x). Instead it distorts the parameter space using an inverse hyperbolic tangens atanh(f). Now the simplex algorithm can freely operate in a space without bounds and pick just any point. Before it gets f(x) in order to assess the solution at x the algorithm switches back into normal space.
At a first glance I found the idea ingenious. This way we avoid the disadvantages of penalty functions. But now I am having doubts. The distorted space affects termination behaviour. One termination criterion is the size of the simplex. By inflating the parameter space with atanh(x) we also inflate the simplex size.
Experiments with the algorithm also show that it does not work as intended. I do not yet understand how this happens, but I do get results that are out of bounds. I can say that almost half of the returned local minima are out of bounds.
As an example, take a look at nmkb() optimising the rosenbrook function when we gradually change the width of the box constraint:
rosbkext <- function(x) {
# Extended Rosenbrock function
n <- length(x)
sum (100*(x[1:(n-1)]^2 - x[2:n])^2 + (x[1:(n-1)] - 1)^2)
}
np <- 6 #12
for (box in c(2, 4, 12, 24, 32, 64, 128)) {
set.seed(123)
p0 <- rnorm(np)
p0[p0 > +2] <- +2 - 1E-8
p0[p0 < -2] <- -2 + 1E-8
ctrl <- list(maxfeval = 5E4, tol = 1E-8)
o <- nmkb(fn = rosbkext, par = p0, lower = -box, upper = +box, control = ctrl)
print(o$message)
cat("f(", format(o$par, digits = 2), ") =", format(o$value, digits=3), "\n")
}
The output shows that it claims to converge but it does not in three cases. And it does that for bounds of (-2,2) and (-12,12). I might accept that but then it also fails at (-128, 128). I also tried the same with the unconstrained dfoptim::nmk(). No trouble there. It converges perfectly.
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 4.42e-09
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 1.3e-08
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 4.22e-09
[1] "Successful convergence"
f( 1 1 1 1 1 1 ) = 8.22e-09
[1] "Successful convergence"
f( -0.99 0.98 0.97 0.95 0.90 0.81 ) = 3.97
Why does the constrained algorithm have more trouble converging than the unconstrained one?
Footnote (1): I am referring to the Nelder-Mead implementation used in the optimx package in R. This package calls another package dfoptim with the nmkb-function.

(This question has nothing to do with optimx, which is just a wrapper for R packages providing unconstrained optimization.)
The function in question is nmkb() in the dfoptim package for gradient-free optimization routines. The approach to transform bounded regions into unbounded spaces is a common one and can be applied with many different transformation functions, sometimes depending on the kind of the boundary and/or the type of the objective function. It may also be applied, e.g., to transform unbounded integration domains into bounded ones.
The approach is problematic if the optimum lies at the boundary, because the optimal point will be sent to (nearly) infinity and cannot ultimately be reached. The routine will not converge or the solution be quite inaccurate.
If you think the algorithm is not working correctly, you should write to the authors of that package and -- that is important -- add one or two examples for what you think are bugs or incorrect solutions. Without explicit code examples no one here is able to help you.
(1) Those transformations define bijective maps between bounded and unbounded regions and the theory behind this approach is obvious. You may read about possible transformations in books on multivariate calculus.
(2) The approach with penalties outside the bounds has its own drawbacks, for instance the target function will not be smooth at the boundaries, and the BFGS method may not be appropriate anymore.
(3) You could try the Hooke-Jeeves algorithm through function hjkb() in the same dfoptim package. It will be slower, but uses a different approach for treating the boundaries, no transformations involved.
EDIT (after discussion with Erwin Kalvelagen above)
There appear to be local minima (with some coordinates negative).
If you set the lower bounds to 0, nmkb() will find the global minimum (1,1,1,1,1,1) in any case.
Watch out: starting values have to be feasible, that is all their coordinates greater 0.

Julia: Linking LAPACK 2.0 on Linux

I am using eigs() function in Julia for computing eigenvalues and eigenvectors. Results are non deterministic and often full of 0.0. Temporary solution is to link LAPACK 2.0.
Any idea how to do it on Linux Ubuntu? So far I am not able to link it and I do not how complex Linux administration skills so It will be good if someone could post some guide for how to link it correctly.
Thanks a lot.
Edit:
I wanted to add results but I noticed one flaw in code. I was using matrix = sparse(map(collect,zip([triple(e,"weight") for e in edges(g)]...))..., num_vertices(g), num_vertices(g)). It answer from you to one of my questions. It works ok when vertices are indexed from 1. But my vertices have random indexes due to reading them from file. So I changed num_vertices to be equal to largest index. But I do not noticed that it was doing for example computations considering 1000 vertices when vertex with max index was 1000 although whole graph could consists of 3 verts 1, 10, 1000 for example. Any idea how to fix it ?
Edit 2:
#Content of matrix = matrix+matrix'
[2, 1] = 10.0
[3, 1] = 14.0
[1, 2] = 10.0
[3, 2] = 10.0
[5, 2] = 2.0
[1, 3] = 14.0
[2, 3] = 10.0
[4, 3] = 20.0
[5, 3] = 20.0
[3, 4] = 20.0
[2, 5] = 2.0
[3, 5] = 20.0
[6, 5] = 10.0
[5, 6] = 10.0
matrix = matrix+matrix'
(d, v) = eigs(matrix, nev=1, which=:LR, maxiter=1)
5 executions of code above:
[-0.3483956604402672
-0.3084333257587648
-0.6697046040724708
-0.37450798643794125
-0.4249810113292739
-0.11882760090004019]
[0.3483956604402674
0.308433325758765
0.6697046040724703
0.3745079864379416
0.424981011329274
0.11882760090004027]
[-0.3483956604402673
-0.308433325758765
-0.669704604072471
-0.37450798643794114
-0.4249810113292739
-0.1188276009000403]
[0.34839566044026726
0.30843332575876503
0.6697046040724703
0.37450798643794114
0.4249810113292739
0.11882760090004038]
[0.34839566044026715
0.30843332575876503
0.6697046040724708
0.3745079864379412
0.4249810113292738
0.11882760090004038]

The algorithm is indeed non-deterministic (as is obvious in the example in the question). But, there are two kinds of non-determinism in the answers:
the complete sign reversals of the eigenvector.
small accuracy errors.
If a vector is an eigenvector, so is every scalar multiple of it (mathematically, the eigenvector is part of a subspace of eigenvectors belonging to an eigenvalue). Thus, if v is an eigenvector, so is λv. When λ = -1 this is the sign reversal. But 2v is also an eigenvector. The eigs function normalizes the vectors to norm 1, so the only freedom left is this sign reversal. To solve this non-determinism, you can choose a sign for the first non-zero coordinate of the vector (say, positive) and multiple the eigenvector to make it so. In code:
v = v*sign(v[findfirst(v)])
Regarding the second non-determinism source (inaccuracies), it is important to note that the true eigenvalues and eigenvectors are often real numbers which cannot be accurately represented by Float64, thus the return values are always off. If the level of accuracy needed is low enough, rounding the values deterministically should make the resulting approximation the same. If this is not clear, consider an algorithm for calculating sqrt(2). It may be non-deterministic and return 1.4142135623730951 and sometimes 1.4142135623730949, but rounding to 5 decimal places would always yield 1.41421.
The above should provide a guide to making the results more deterministic. But consider:
If there are multiple eigenvalues with the same value, the subspace of eigenvectors is more than 1 dimensional and there is more freedom to choose an eigenvector. This could make finding a deterministic vector (or vectors) to span this space more intricate.
Does the application really require this determinism?
(Thanks for the code bits - they do help. Even better when they can be quickly cut-and-pasted).

Julia Convex.jl unbounded error

I'm trying to calculate the spectral norm of A, this seems straight forward but the solver is telling me it's unbounded, which doesn't make sense, since y must have unit norm.
using Convex
using SCS
set_default_solver(SCSSolver(verbose=0))
A = [1 2; 3 4]
y = Variable(2)
expr = norm(A*y, 2)
constr = norm(y, 2) == 1
problem = maximize(expr, constr)
solve!(problem)
Am I missing something?
Edit: Removing solve!(problem) from the code (thus just setting things up), results in a warning that the problem is not DCP (disciplined convex programming) compliant, yet since this is just calculating the spectral norm of A, it should be convex.

R: locpoly is incorrectly returning NaN

Running the following code gives me a NaN:
library(KernSmooth)
x <- c(5.84155992364115, 1.55292112974119, 0.0349665318792623, 3.93053647398094,
3.42790577684633, 2.9715553006801, 0.837108410045353, 2.872476865277,
3.89232548092257, 0.206399650539628)
y <- c(0.141415317472329, 1.34799648955049, 0.0297566221758204,
-0.966736679061812, 0.246306732122746, 0.557982376254723,
0.740542828791083, 0.162336127802977, -0.428804158514744,
0.691280978689863)
locpoly(x, y, bandwidth = 0.4821232, gridsize = 12, degree = 1)[['y']]
I get
[1] 0.3030137 0.6456624 0.9530586 1.1121106 0.8120947 0.4441603
[7] 0.1425592 -0.3600028 -0.7840411 -1.0517612 -1.2690134 NaN
On another computer, I get the same, except I get -0.7270521 instead of NaN. I am guessing that most of you will also get that. So the question is how do I fix my broken system? Does this have to do with my LAPACK or LIBBLAS?
Note that both computers mentioned above use Ubuntu. The one that gave NaN uses Ubuntu 13.10, the one that gave a number is on 12.04.
EDIT:
My new suspicion is that it is a floating point calculation issue:
A local polynomial regression is just a weighted linear regression, where the weights decrease the further the point is away from the point of evaluation, in this case 5.84. One should note that the bandwidth is small so a first thought is that there are no points within the bandwidth. However, locpoly uses a Gaussian kernel, so that all points have strictly positive weight. My guess is that the weights are so small though that rounding or floating point calculation can be a problem. I'm not sure how to fix that.

Not an answer, but wanted to post a graph. I'm still not clear on what you expected to get from locpoly, but here it is.
Rgames> foo<-locpoly(x, y, bandwidth = 0.4821232, gridsize = 12, degree = 1)
Rgames> foo
$x
[1] 0.03496653 0.56283866 1.09071078 1.61858291 2.14645504 2.67432716
[7] 3.20219929 3.73007142 4.25794354 4.78581567 5.31368780 5.84155992
$y
[1] 0.3030137 0.6456624 0.9530586 1.1121106 0.8120947 0.4441603
[7] 0.1425592 -0.3600028 -0.7840411 -1.0517612 -1.2690134 NaN
My suspicion is that last point on the far right diverges for the fitting parameters in use, and it was dumb luck that you got a non-NaN value under any OS.

If I am using Windows 7 and R 3.0, I get:
> locpoly(x, y, bandwidth = 0.4821232, gridsize = 12, degree = 1)[['y']]
[1] 0.3030137 0.6456624 0.9530586 1.1121106 0.8120947
[6] 0.4441603 0.1425592 -0.3600028 -0.7840411 -1.0517612
[11] -1.2690134 -2.8078788
So your issue wasn't there. However if I use R 3.0 on Ubuntu 13.04 (GNU/Linux 3.8.0-23-generic x86_64) I get:
> locpoly(x, y, bandwidth = 0.4821232, gridsize = 12, degree = 1)[['y']]
[1] 0.3030137 0.6456624 0.9530586 1.1121106 0.8120947 0.4441603
[7] 0.1425592 -0.3600028 -0.7840411 -1.0517612 -1.2690134 NaN
I tried experimenting and was able to get numbers very similar to what I got in Windows 7 by using:
> locpoly(round(x,3), round(y,3), bandwidth = 0.4821232, gridsize = 12, degree = 1)[['y']]
[1] 0.3032295 0.6459197 0.9533132 1.1121400 0.8118960 0.4437407
[7] 0.1422658 -0.3604210 -0.7848982 -1.0531299 -1.2710219 -0.7269588
So I hope that is able to solve your second problem.
In order to figure out why I was able to get non-NaN answer with Windows, but not Ubuntu, we can look at http://cran.r-project.org/web/packages/KernSmooth/index.html and notice that:
MacOS X binary: KernSmooth_2.23-10.tgz
Windows binary: KernSmooth_2.23-11.zip
Naturally there are two different versions, but the Windows binary is one version further than MacOS X binary. I checked out the sourcecode for the functions in Ubuntu and Windows and they look to be the same. However, I did find this Rounding differences on Windows vs Unix based system in sprintf showing that there is a reported bug for differences in rounding between unix and windows. Although that was asked 3 years ago. So I would say the difference might be OS or version for KernSmooth (would lean toward OS as others have also encountered that issue)

I'm on Windows 7, R 3.0.1.
It does seem to be a floating point issue, but because of max(x): changing the first entry in x(which happens to be it's max) from 5.84155992364115 to 5.841559923 your NaN becomes Inf, and to 5.84155992 your NaNbecomes -0.7261049.
Also setting the option truncate to FALSE changes the ouput considerably:
locpoly(x, y, bandwidth = 0.4821232, gridsize = 12, degree = 1, truncate=F)[['y']]
[1] 0.3030137 0.6456624 0.9530586 1.1121106 0.8120947 0.4441603 0.1425592 -0.3600028 -0.7449278 -0.3872891 -0.1235228 0.1414153
which I wouldn't have anticipated since you didn't specify range.x.

You're asking for a local polynomial of degree 1 (Requires 2 points to fit, minimum) and there is only one point local to 5.84155992364115. The real question is, why didn't it give you a nice error telling you to up the bandwidth. Nudge it up to 0.5 and it all works.

I will like to put it differently,
I am not regular user of ubuntu, but know NaN(Not a Number) which was started by Java!
First I will say update Lapack
And make sure all files are installed correctly (Recent Bug)
if some file is missing and the number is not processed well.
Divide by Zero (or Invalid result due to missing library) can cause NAN in result.
I don't think ubuntu has any problem with this as this.
Please specify version of LAPACK from better understanding.(including Ubuntu is 32 or 64 bit and LAPACK is 32 or 64bit)
I hope this will help.

Statistical inefficiency (block-averages)

I have a series of data, these are obtained through a molecular dynamics simulation, and therefore are sequential in time and correlated to some extent. I can calculate the mean as the average of the data, I want to estimate the the error associated to mean calculated in this way.
According to this book I need to calculate the "statistical inefficiency", or roughly the correlation time for the data in the series. For this I have to divide the series in blocks of varying length and, for each block length (t_b), the variance of the block averages (v_b). Then, if the variance of the whole series is v_a (that is, v_b when t_b=1), I have to obtain the limit, as t_b tends to infinity, of (t_b*v_b/v_a), and that is the inefficiency s.
Then the error in the mean is sqrt(v_a*s/N), where N is the total number of points. So, this means that only one every s points is uncorrelated.
I assume this can be done with R, and maybe there's some package that does it already, but I'm new to R. Can anyone tell me how to do it? I have already found out how to read the data series and calculate the mean and variance.
A data sample, as requested:
# t(ps) dH/dl(kJ/mol)
0.0000 582.228
0.0100 564.735
0.0200 569.055
0.0300 549.917
0.0400 546.697
0.0500 548.909
0.0600 567.297
0.0700 638.917
0.0800 707.283
0.0900 703.356
0.1000 685.474
0.1100 678.07
0.1200 687.718
0.1300 656.729
0.1400 628.763
0.1500 660.771
0.1600 663.446
0.1700 637.967
0.1800 615.503
0.1900 605.887
0.2000 618.627
0.2100 587.309
0.2200 458.355
0.2300 459.002
0.2400 577.784
0.2500 545.657
0.2600 478.857
0.2700 533.303
0.2800 576.064
0.2900 558.402
0.3000 548.072
... and this goes on until 500 ps. Of course, the data I need to analyze is the second column.

Suppose x is holding the sequence of data (e.g., data from your second column).
v = var(x)
m = mean(x)
n = length(x)
si = c()
for (t in seq(2, 1000)) {
nblocks = floor(n/t)
xg = split(x[1:(nblocks*t)], factor(rep(1:nblocks, rep(t, nblocks))))
v2 = sum((sapply(xg, mean) - m)**2)/nblocks
#v rather than v1
si = c(si, t*v2/v)
}
plot(si)
Below image is what I got from some of my time series data. You have your lower limit of t_b when the curve of si becomes approximately flat (slope = 0). See http://dx.doi.org/10.1063/1.1638996 as well.

There are a couple different ways to calculate the statistical inefficiency, or integrated autocorrelation time. The easiest, in R, is with the CODA package. They have a function, effectiveSize, which gives you the effective sample size, which is the total number of samples divided by the statistical inefficiency. The asymptotic estimator for the standard deviation in the mean is sd(x)/sqrt(effectiveSize(x)).
require('coda')
n_eff = effectiveSize(x)

Well it's never too late to contribute to a question, isn't it?
As I'm doing some molecular simulation myself, I did step uppon this problem but did not see this thread already. I found out that the method actually proposed by Allen & Tildesley seems a bit out dated compared to modern error analysis methods. The rest of the book is good enought to worth the look though.
While Sunhwan Jo's answer is correct concerning block averages method,concerning error analysis you can find other methods like the jacknife and bootstrap methods (closely related to one another) here: http://www.helsinki.fi/~rummukai/lectures/montecarlo_oulu/lectures/mc_notes5.pdf
In short, with the bootstrap method, you can make series of random artificial samples from your data and calculate the value you want on your new sample. I wrote a short piece of Python code to work some data out (don't forget to import numpy or the functions I used):
def Bootstrap(data):
B = 100 # arbitraty number of artificial samplings
es = 0.
means = numpy.zeros(B)
sizeB = data.shape[0]/4 # (assuming you pass a numpy array)
# arbitrary bin-size proportional to the one of your
# sampling.
for n in range(B):
for i in range(sizeB):
# if data is multi-column array you may have to add the one you use
# specifically in randint, else it will give you a one dimension array.
# Check the doc.
means[n] = means[n] + data[numpy.random.randint(0,high=data.shape[0])] # Assuming your desired value is the mean of the values
# Any calculation is ok.
means[n] = means[n]/sizeB
es = numpy.std(means,ddof = 1)
return es
I know it can be upgraded but it's a first shot. With your data, I get the following:
Mean = 594.84368
Std = 66.48475
Statistical error = 9.99105
I hope this helps anyone stumbling across this problem in statistical analysis of data. If I'm wrong or anything else (first post and I'm no mathematician), any correction is welcomed.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex