Unequally/ unevenly distributed random floats - math

How can I generate random floats between X and Y, that are un-equally distributed, so that it's more likely to generate numbers from a specific range, within X to Y?
I did search a lot of keywords to find something like this including: unequally distributed random numbers, or uneven noise distribution, or biased random floats, or weighted random numbers ...
All I could find is to randomly pick from a bag of finite list of values, weighted so that some values are more likely to be chosen, but I'm looking to choose from an infinite range of floats between X and Y.
Also I found a lot of articles about how to NOT generate biased random numbers, which is the opposite of what I want.
As an example of WHAT I'm trying to do with these numbers: If you draw black noise in a white square, each noise dot is in a random location within the square, if you generate enough dots, you'll have a almost black square.
If you distribute the randomness with a higher probability in the middle of the square, you'll draw almost a soft black dot in the middle of the square. This is what I'm trying to generate.
So my questions are:
I'm sure these algorithms exist, how are they called?
maybe can anyone suggest a quick implementation, in any language?
how can you specify the weight of the bias? Eg: it's 2x more likely, or 5x more likely to generate numbers in a specific range? In my example with the dot, I believe if it's 5x more likely to get numbers in a range, the dot will be smaller and darker.
how can you specify the softness of the distribution? Eg: linear, longaritmic, quadratic. In my example with the dot, I believe it will make the dot softer, or harder.
Thank you in advance!

These are generally called absolutely continuous distributions, and the following are two ways to define this kind of distribution.
As a probability density function, which is (roughly speaking) a function that gives the probability weight of any number. If you have a distribution in this form, some of the ways to sample from this distribution include piecewise linear interpolation, rejection sampling, and Markov chain Monte Carlo methods. For further information, see "Random Numbers from an Arbitrary Distribution".
As a list of weights of individual points from X to Y. The weights in between these points are linearly interpolated. One example of a way to sample from this distribution is given in C++ as std::piecewise_linear_distribution. See also "Piecewise Linear Distribution".
For many popular distributions, such as the normal, beta, and gamma distributions, there are special methods for generating random numbers with those distributions. In fact, there are many different designs of such methods for the normal distribution. For numbers in a bounded range, the beta distribution is an ideal choice; its two parameters (alpha and beta) describe a wide variety of shapes that could suit your purposes. Python has a random.betavariate(alpha, beta) method for generating beta-distributed random numbers.

Related

The reason of bias'es usage in networks?

It may be easy to see why but I still don't understand why we use bias in a neutral network? The weight's values will get changed, therefore ensuring whether the algorithm will learn. So, why use bias in all of this?
Because of linear equations.
Bias is another learned parameter. A single neuron will compute w*x + b where w is your weight parameter and b your bias.
Perhaps this helps you: Let's assume you are dealing with a 2D euclidian space that you'd like to classify with two labels. You can do that by computing a linear function and then classify everything below it with one label, and everything below with another. If you would not use the bias you could only change the slope of your function and your function would always intersect (0, 0). Bias gives you the possibility to define where that linear function intersects the y-axis for x=0, i.e. (0, y). E.g. without bias you could not separate data that is only separatable by the y-axis.

Spatstat, using the Matérn cluster process to generate homogeneous landscapes, how do I interpret the Ripley K function?

I am looking to develop a point process that ranges from homogeneous, i.e. no correlation between points to a point cluster process that does have correlation between points. From experimentation I can see that using the Matérn cluster process I can generate landscapes that are clustered.
library(spatstat)
plot(rMatClust(kappa=3,r=0.1,mu=50))
I want to use the simplest code that increases the level of homogeneity, i.e. decreasing dependence of points on each other. I do not want to use a binary model where either the pattern is homogeneous or not. i.e. Just a poisson process which can be generated such as:
plot(rpoispp(150))
From experimentation I noticed that if I increase the radius of the clusters using the Matérn cluster process, I do seem to create a pseudo homogeneous pattern.
plot(rMatClust(kappa=3,r=0.3,mu=50))
plot(rMatClust(kappa=3,r=0.7,mu=50))
Is this a good way of generating degrees of homogeneity? I understand that I can use statistical tests to measure the degree of clustering compared to a complete poisson process, such as the Ripley K test. For example, if I assign the Matérn cluster process data to variables, such as:
a<-rMatClust(kappa=3,r=0.1,mu=50)
b<-rMatClust(kappa=3,r=0.3,mu=50)
c<-rMatClust(kappa=3,r=0.7,mu=50)
Then use the Ripley K test and plot the results:
plot(Kest(a))
plot(Kest(b))
plot(Kest(c))
I can see that the difference between a homogeneous poisson process and the clustered point process decreases. I still do not fully understand the significance of the various K values according to edge effects and so forth, and how to interpret the Ripley K function, but I think this is the right direction to be heading in? How do I interpret the Ripley K function? Another problem is the number of points in each plot, I do not have a consistent number of points in each plot, as can be seen by:
summary(a)
summary(b)
summary(c)
Any knowledgeable feedback on this is greatly appreciated.
The standard terminology is that you want to generate a clustered point pattern.
The function rMatClust generates a clustered point pattern at random, in a two-stage process. The first stage is to generate "parent" points completely at random. The second stage is to generate, for each "parent", a random number of "offspring" points, and to place the "offspring" points inside a circle of radius R around their "parent". The final result is the collection of all "offspring" points. From this description (and help(rMatClust)) you can figure out what happens for different parameter values.
The K function (not the "K test") is a summary of the spacing between points in a point pattern. At a distance r, the value of K(r) is the normalised average number of points observed to fall within distance r of a typical point in the pattern. It is normalised so that it does not depend on the number of points, making it possible to compare patterns with different numbers of points.
When you plot the K function, one of the curves is the theoretical curve that would be expected if the points are completely random, and the other curves are computed from the data point pattern. This allows you to assess whether the point pattern appears to be clustered.
I strongly suggest you do some reading in Chapter 7 of the spatstat book. You can download this chapter for free.

Calculate volume from crossections

I have an irregularly shaped 3d object. Of this object I know the areas of the crossections in regular intervals. How can I calculate the volume of this object?
You can only approximate the volume. Just add up all the areas and then multiply by the distance between intervals.
Obviously the smaller the distance between intervals, the more accurate the volume. It is just integration (calculus).
Discretize it using tetrahedra or bricks and add up their volumes, a la finite element methods. Integrate using Gaussian quadrature and sum.
You're estimating a Riemann integral. There are many methods to do this, of varying complexity. Simpson's rule is reasonably straightforward and will be pretty accurate as long as the cross-sectional area varies in a smooth enough fashion, however it requires that the number of intervals be even.
Ed Heal's answer is a Riemann sum that approaches the (volume) integral in the limit. Depending on where the cross-sections are located with respect to the extent of the object, it might be viewed as an application of the midpoint rule.
Assuming the cross-section area varies smoothly with distance (twice continuously differentiable along the axis perpendicular to the cross-sections), the midpoint rule and trapezoid rule have accuracy that improves with the square of the interval width (here assumed regular). Averaging the midpoint and trapezoid rule approximations amounts to an application of Simpson's rule, outlined in Peter Milley's answer, with higher order accuracy (improving with the fourth power of the interval width) provided the integrand is sufficiently smooth (continuous 4th derivative of cross-section area with respect to distance).
Of course many real world figures will not have such smoothness (too many corners, holes, etc.), so it is prudent not to expect exceptional accuracy from making more sophisticated approximations.

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

Point Sequence Interpolation

Given an arbitrary sequence of points in space, how would you produce a smooth continuous interpolation between them?
2D and 3D solutions are welcome. Solutions that produce a list of points at arbitrary granularity and solutions that produce control points for bezier curves are also appreciated.
Also, it would be cool to see an iterative solution that could approximate early sections of the curve as it received the points, so you could draw with it.
The Catmull-Rom spline is guaranteed to pass through all the control points. I find this to be handier than trying to adjust intermediate control points for other types of splines.
This PDF by Christopher Twigg has a nice brief introduction to the mathematics of the spline. The best summary sentence is:
Catmull-Rom splines have C1
continuity, local control, and
interpolation, but do not lie within
the convex hull of their control
points.
Said another way, if the points indicate a sharp bend to the right, the spline will bank left before turning to the right (there's an example picture in that document). The tightness of those turns in controllable, in this case using his tau parameter in the example matrix.
Here is another example with some downloadable DirectX code.
One way is Lagrange polynominal, which is a method for producing a polynominal which will go through all given data points.
During my first year at university, I wrote a little tool to do this in 2D, and you can find it on this page, it is called Lagrange solver. Wikipedia's page also has a sample implementation.
How it works is thus: you have a n-order polynominal, p(x), where n is the number of points you have. It has the form a_n x^n + a_(n-1) x^(n-1) + ...+ a_0, where _ is subscript, ^ is power. You then turn this into a set of simultaneous equations:
p(x_1) = y_1
p(x_2) = y_2
...
p(x_n) = y_n
You convert the above into a augmented matrix, and solve for the coefficients a_0 ... a_n. Then you have a polynomial which goes through all the points, and you can now interpolate between the points.
Note however, this may not suit your purpose as it offers no way to adjust the curvature etc - you are stuck with a single solution that can not be changed.
You should take a look at B-splines. Their advantage over Bezier curves is that each part is only dependent on local points. So moving a point has no effect on parts of the curve that are far away, where "far away" is determined by a parameter of the spline.
The problem with the Langrange polynomial is that adding a point can have extreme effects on seemingly arbitrary parts of the curve; there's no "localness" like described above.
Have you looked at the Unix spline command? Can that be coerced into doing what you want?
There are several algorithms for interpolating (and exrapolating) between an aribtrary (but final) set of points. You should check out numerical recipes, they also include C++ implementations of those algorithms.
Unfortunately the Lagrange or other forms of polynomial interpolation will not work on an arbitrary set of points. They only work on a set where in one dimension e.g. x
xi < xi+1
For an arbitary set of points, e.g. an aeroplane flight path, where each point is a (longitude, latitude) pair, you will be better off simply modelling the aeroplane's journey with current longitude & latitude and velocity. By adjusting the rate at which the aeroplane can turn (its angular velocity) depending on how close it is to the next waypoint, you can achieve a smooth curve.
The resulting curve would not be mathematically significant nor give you bezier control points. However the algorithm would be computationally simple regardless of the number of waypoints and could produce an interpolated list of points at arbitrary granularity. It would also not require you provide the complete set of points up front, you could simply add waypoints to the end of the set as required.
I came up with the same problem and implemented it with some friends the other day. I like to share the example project on github.
https://github.com/johnjohndoe/PathInterpolation
Feel free to fork it.
Google "orthogonal regression".
Whereas least-squares techniques try to minimize vertical distance between the fit line and each f(x), orthogonal regression minimizes the perpendicular distances.
Addendum
In the presence of noisy data, the venerable RANSAC algorithm is worth checking out too.
In the 3D graphics world, NURBS are popular. Further info is easily googled.

Resources