What is the purpose in this part of the Monte Carlo path tracing algorithm? - math

In all of the simple algorithms for path tracing using lots of monte carlo samples the tracing the path part of the algorithm randomly chooses between returning with the emitted value for the current surface and continuing by tracing another ray from that surface's hemisphere (for example in the slides here). Like so:
TracePath(p, d) returns (r,g,b) [and calls itself recursively]:
Trace ray (p, d) to find nearest intersection p’
Select with probability (say) 50%:
Emitted:
return 2 * (Le_red, Le_green, Le_blue) // 2 = 1/(50%)
Reflected:
generate ray in random direction d’
return 2 * fr(d ->d’) * (n dot d’) * TracePath(p’, d’)
Is this just a way of using russian roulette to terminate a path while remaining unbiased? Surely it would make more sense to count the emissive and reflective properties for all ray paths together and use russian roulette just to decide whether to continue tracing or not.
And here's a follow up question: why do some of these algorithms I'm seeing (like in the book 'Physically Based Rendering Techniques') only compute emission once, instead of taking in to account all the emissive properties on an object? The rendering equation is basically
L_o = L_e + integral of (light exiting other surfaces in to the hemisphere of this surface)
which seems like it counts the emissive properties in both this L_o and the integral of all the other L_o's, so the algorithms should follow.

In reality, the single emission vs. reflection calculation is a bit too simplistic. To answer the first question, the coin-flip is used to terminate the ray but it leads to much greater biases. The second question is a bit more complex....
In the abstract of Shirley, Wang and Zimmerman TOG 94, the authors briefly summarize the benefits and complexities of Monte Carlo sampling:
In a distribution ray tracer, the crucial part of the direct lighting
calculation is the sampling strategy for shadow ray testing. Monte
Carlo integration with importance sampling is used to carry out this
calculation. Importance sampling involves the design of
integrand-specific probability density functions which are used to
generate sample points for the numerical quadrature. Probability
density functions are presented that aid in the direct lighting
calculation from luminaires of various simple shapes. A method for
defining a probability density function over a set of luminaires is
presented that allows the direct lighting calculation to be carried
out with one sample, regardless of the number of luminaires.
If we start dissecting that abstract, here are some of the important points:
Lights aren't points: in real life, we're almost never dealing with a point light source (e.g., a single LED).
Shadows are usually soft: this is a consequence of the non-point lights. It's very rare to see a truly hard-edged shadow in real life.
Noise (especially bright sampling artifacts) are disproportionately distracting: humans have a lot of intuition about how things should look. Look at slide 5 (the glass sphere on a table) in the OP's linked presentation. Note the bright specks in the shadow.
When rendering for more visual realism, both of the sets of reflected visibility rays and lighting calculation rays must be sampled and weighted according to the surface's bidirectional reflectance distribution function.
Note that this is a guided sampling method that's distinctly different from the original question's "generate ray in random direction" method in that it is both:
More accurate: the images in the linked PDF suffer a bit from the PDF process. Figure 10 is a reasonable representation of the original - note that lack of bright speckle artifacts that you will sometimes see (as in figure 5 of the original presentation).
Significantly faster: as the original presentation notes, unguided Monte Carlo sampling can take quite a while to converge. More sampling rays = much more computation = more time.

After reading the slides (thank you for posting), I'll amend my answer as best I can.
Is this just a way of using russian roulette to terminate a path
while remaining unbiased? Surely it would make more sense to count
the emissive and reflective properties for all ray paths together
and use russian roulette just to decide whether to continue tracing
or not.
Perhaps the emitted and reflected properties are treated differently because the reflected path depends on the incident path in a way that emitted paths do not (at least for a spectral surface). Does the algorithm take a Bayesian approach and use prior information about the incidence angle as a prior for predicting the reflective angle? Or is this a Feynman integration over all paths to come up with a probability? It's hard to tell without digging deeper into the details of the theory.
My earlier black body comment is quite incorrect. I see that the slides talk about (R, G, B) components; black body emissivities are integrated over all wavelengths.
And here's a follow up question: why do some of these algorithms I'm
seeing (like in the book 'Physically Based Rendering Techniques')
only compute emission once, instead of taking in to account all the
emissive properties on an object? The rendering equation is
basically
L_o = L_e + integral of (light exiting other surfaces in to the
hemisphere of this surface)
A single emissivity for the surface would assume that there's no functional relationship on wavelength or direction. I don't know how significant it is for rendering photo-realistic images.
The ones that are posted are certainly impressive. I wonder how different they would look if the complexities that you have in mind were included?
Thank you for posting a nice question - I'm voting it up. It's been a long time since I've thought about this kind of problem. I wish I could be more helpful.

Yes that is a very basic implementation of Russian Roulette, though normally the probability of terminating would take into account the light intensity (i.e. less light means the value contributes less to the final summation so use a higher probability of terminating).

Related

OpenMDAO Dymos defect_refs -- how should I set these?

I was hoping to get some information on how to set my defect refs in Dymos a smart way. I found the following notes on scaling here https://github.com/hweyandtnasa/scaling-tutorial but it lists defect scaling in Dymos as a TODO still. Should I just set them equal to the ref value for the state they pertain to?
Scaling pseudospectral optimal control problems is tricky. If you can get a copy of John Betts' Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, I highly recommend it. Betts suggest using the same scaling for both the state design variable values and the defects. This is often a good rule of thumb, but as with most approaches to scaling, isn't universal. The collocation "defects" which dictate whether the dynamics are physically correct are just the difference between the slope of the approximating polynomial and the computed equations of motion.
In situations where state values are large but tiny rates of change are significant, then different scaling is warranted in my experience. Examples of states where these can be true are aircraft range or spacecraft orbital elements. Just recently we had a situation where a low-thrust orbit transfer of spacecraft wasn't matching physics. The semi-latus rectum, for instance, is typically measured in km, so on the scale of thousands when in Earth orbit). In the units being used, a "significant" difference in the defect was less than 1E-6 (the threshold for feasibility being used). In this case, the problem was solved by bumping the defect_scaler up a few orders of magnitude (equivalent to bumping the defect_ref down a few orders of magnitude).
I'd also recommend this paper from Ross, Gong, Karpenko, and Proulx. It lays out some good rules of thumb and has an approachable example in the brachistochrone. It references costates a lot. Dymos doesn't provide automatic costate estimation yet, but they are closely related to the lagrange multipliers of the problem, which are printed in the pyoptsparse output if you use SNOPT.
The github repo you pointed out was the work of an intern and was based around this scaling method developed by Sagliano. We found it to work well in a many situations, but it's also not a panacea.
Ultimately we want some automatic scaling options in Dymos and/or OpenMDAO, but we're not sure when they might find their way into the framework. Our past work has typically tied scaling approaches more tightly to the equations of motion, and Dymos is designed to be more general in that the user can supply whatever EOM they choose.
In Dymos, if you leave the defect_ref value unset when you call set_state_options then the default behavior is to make make the defect_ref equal to the ref value. Here is why that is done:
Defects are the differences between the computed state rate from the polynomial interpolation function and the actual state rate computed by the ODE.
As you can see here:
defect = (f_approx-f_computed) * dt_dstau
the dt_dstau just adjusts things into a normalized time space called tau but it also multiplies by the time unit as well (tau is dimensionless). That means the defects are computed in the same units as the states themselves. Thus a reasonable guess for scaling is to match the scaling between the states and the defects. As Rob Falck's answer points out that is not always the right solution, but it's a good starting point.

Range-based positioning/trilateration: Solving with a Kalman-Filter, smoothing with a particle filter (et vice versa)?

So, in this question I'd be grateful for hints and further information if I am correct or no.
To calculate the position upon range-measurments to fixed anchors (like GPS) you need to solve the trilateration problem, for example: non-linear least squares, geometrical algorithms or the particle filter, which also is able to solve the trilateration-problem as such.
Due to noise/errors the result might be a jagged line -> you can use the Kalman-Filter to smooth it. So far: Particle - calculation, Kalman - smoothing. Now:
Is it possible to use a Kalman-Filter NOT to smoothen an already existing result, BUT to solve the trilateration as such?
Regarding the particle filter: How to use the particle filter NOT to solve trilateration, BUT to smoothen an already existing result (e.g. calculated with NLLS)?
Best and thank you for any hints, papers, videos, solutions etc.!
The Kalman filter is an optimal solver for linear Gaussian problems. It is often used to solve the trilateration problem (Question 1). To use it in this problem the Jacobian (partial derivative of the range measurement with respect to the position) is linearized at the current position estimate. That process, linearization of the Jacobian, defines the Kalman filter as an Extended Kalman Filter, or EKF in the literature. That works well for GPS because the range to the transmitter is so great that the error in the Jacobian estimate due to position error is small enough to be negligible if the Kalman filter is crudely initialized, for example within 100 km. It breaks down when the 'fixed anchors' are closer to the user. The closer the anchor, the more quickly the line-of-sight vector to the anchor is changing with the position estimate. In these cases Unscented Kalman Filters (UKF) or Particle Filters (PF) are sometimes used instead of an EKF.
The best introduction to the KF and EKF in my view is Applied Optimal Estimation by Gelb. That book has been in print since 1974, and there is a reason why. A discussion of the breakdown of the EKF when the anchor is close can be found in the paper "The Scaled Unscented Transformation" by Julier, which can be found here.
For question 2, the answer is yes, certainly a PF could be used to smooth a solution that is created, for example, by replacing the range measurements with an epoch-by-epoch result from a least-squares solver for the position. I would not recommend the approach. The power of the PF, and the reason we pay the price of computing everything for each particle, is that it handles the non-linearities. To 'pre-linearize' the problem before handing it to the PF defeats its purpose.

When and why is crossover beneficial in differential evolution?

I implemented a differential evolution algorithm for a side project I was doing. Because the crossover step seemed to involve a lot of parameter choices (e.g. crossover probabilities), I decided to skip it and just use mutation. The method seemed to work ok, but I am unsure whether I would get better performance if I introduced crossover.
Main Question: What is the motivation behind introducing crossover to differential evolution? Can you provide a toy example where introducing crossover out-performs pure mutation?
My intuition is that crossover will produce something like the following in 2-dimensions. Say
we have two parent vectors (red). Uniform crossover could produce a new trial vector at one of the blue points.
I am not sure why this kind of exploration would be expected to be beneficial. In fact, it seems like this could make performance worse if high-fitness solutions follow some linear trend. In the figure below, lets say the red points are the current population, and the optimal solution is towards the lower right corner. The population is traveling down a valley such that the upper right and lower left corners produce bad solutions. The upper left corner produces "okay" but suboptimal solutions. Notice how uniform crossover produces trials (in blue) that are orthogonal to the direction of improvement. I've used a cross-over probability of 1 and neglected mutation to illustrate my point (see code). I imagine this situation could arise quite frequently in optimization problems, but could be misunderstanding something.
Note: In the above example, I am implicitly assuming that the population was randomly initialized (uniformly) across this space, and has begun to converge to the correct solution down the central valley (top left to bottom right).
This toy example is convex, and thus differential evolution wouldn't even be the appropriate technique. However, if this motif was embedded in a multi-modal fitness landscape, it seems like crossover might be detrimental. While crossover does support exploration, which could be beneficial, I am not sure why one would choose to explore in this particular direction.
R code for the example above:
N = 50
x1 <- rnorm(N,mean=2,sd=0.5)
x2 <- -x1+4+rnorm(N,mean=0,sd=0.1)
plot(x1,x2,pch=21,col='red',bg='red',ylim=c(0,4),xlim=c(0,4))
x1_cx = list(rep(0, 50))
x2_cx = list(rep(0, 50))
for (i in 0:N) {
x1_cx[i] <- x1[i]
x2_cx[i] <- x2[sample(1:N,1)]
}
points(x1_cx,x2_cx,pch=4,col='blue',lwd=4)
Follow-up Question: If crossover is beneficial in certain situations, is there a sensible approach to a) determining if your specific problem would benefit from crossover, and b) how to tune the crossover parameters to optimize the algorithm?
A related stackoverflow question (I am looking for something more specific, with a toy example for instance): what is the importance of crossing over in Differential Evolution Algorithm?
A similar question, but not specific to differential evolution: Efficiency of crossover in genetic algorithms
I am not particularly familiar with the specifics of the DE algorithm but in general the point of crossover is that if you have two very different individuals with high fitness it will produce an offspring that is intermediate between them without being particularly similar to either. Mutation only explores the local neighbourhood of each individual without taking the rest of the population into account. If you think of genomes as points in some high dimensional vector space, then a mutation is shift in a random direction. Therefore mutation needs to take small steps since if your are starting from a significantly better than random position, a long step in a random direction is almost certain to make things worse because it is essentially just introducing entropy into an evolved genome. You can think of a cross over as a step from one parent towards the other. Since the other parent is also better than random, it is more promising to take a longer step in that direction. This allows for faster exploration of the promising parts of the fitness landscape.
In real biological organisms the genome is often organized in such a way that genes that depend on each other are close together on the same chromosome. This means that crossover is unlikely to break synergetic gene combinations. Real evolution actually moves genes around to achieve this (though this is much slower than the evolution of individual genes) and sometimes the higher order structure of the genome (the 3 dimensional shape of the DNA) evolves to prevent cross-overs in particularly sensitive areas. These mechanisms are rarely modeled in evolutionary algorithms, but you will get more out of crossovers if you order your genome in a way that puts genes that are likely to interact close to each other.
No. Crossover is not useful. There I said it. :P
I've never found a need for crossover. People seem to think it does some kind of magic. But it doesn't (and can't) do anything more useful than simple mutation. Large mutations can be used to explore the entire problem space and small mutations can be used to exploit niches.
And all the explanations I've read are (to put it mildly) unsatisfactory. Crossover only complicates your algorithms. Drop it asap. Your life will be simpler. .... IMHO.
As Daniel says, cross over is a way to take larger steps across the problem landscape, allowing you to escape local maxima that a single mutation would be unable to do so.
Whether it is appropriate or not will depend on the complexity of the problem space, how the genotype -> phenotype expression works (will related genes be close together), etc.
More formally this is the concept of 'Connectivity' in Local Search algorithms, providing strong enough operators that the local search neighbourhood is sufficentally large to escape local minima.

rapid exploring random trees

http://msl.cs.uiuc.edu/rrt/
Can anyone explain how rrt works with simple wording that is easy to understand?
I read the description in the site and in wikipedia.
What I would like to see, is a short implementation of a rrt or a thorough explanation of the following thing:
Why does the rrt grow outwards instead of just growing very dense around the center?
How is it different from a naive random tree?
How is the next new vertex that we attempt to reach picked?
I know there is an Motion Strategy Library I could download but I would much rather understand the idea before I delve into the code rather than the other way around.
The simplest possible RRT algorithm has been so successful because it is pretty easy to implement. Things tend to get complicated when you:
need to visualise planning concepts in more than two dimensions
are unfamiliar with the terminology associated with planning, and;
in the huge number of variants of RRT that are have been described in the literature.
Pseudo code
The basic algorithm looks something like this:
Start with an empty search tree
Add your initial location (configuration) to the search tree
while your search tree has not reached the goal (and you haven't run out of time)
3.1. Pick a location (configuration), q_r, (with some sampling strategy)
3.2. Find the vertex in the search tree closest to that random point, q_n
3.3. Try to add an edge (path) in the tree between q_n and q_r, if you can link them without a collision occurring.
Although that description is adequate, after a while working in this space, I really do prefer the pseudocode of figure 5.16 on RRT/RDT in Steven LaValle's book "Planning Algorithms".
Tree Structure
The reason that the tree ends up covering the entire search space (in most cases) is because of the combination of the sampling strategy, and always looking to connect from the nearest point in the tree. This effect is described as reducing the Voronoi bias.
Sampling Strategy
The choice of where to place the next vertex that you will attempt to connect to is the sampling problem. In simple cases, where search is low dimensional, uniform random placement (or uniform random placement biased toward the goal) works adequately. In high dimensional problems, or when motions are very complex (when joints have positions, velocities and accelerations), or configuration is difficult to control, sampling strategies for RRTs are still an open research area.
Libraries
The MSL library is a good starting point if you're really stuck on implementation, but it hasn't been actively maintained since 2003. A more up-to-date library is the Open Motion Planning Library (OMPL). You'll also need a good collision detection library.
Planning Terminology & Advice
From a terminology point of view, the hard bit is to realise that although lots of the diagrams you see in the (early years of) publications on RRT are in two dimensions (trees that link 2d points), that this is the absolute simplest case.
Typically, a mathematically rigorous way to describe complex physical situations is required. A good example of this is planning for a robot arm with n- linkages. Describing the end of such an arm requires a minimum of n joint angles. This set of minimum parameters to describe a position is a configuration (or some publications state). A single configuration is often denoted q
The combination of all possible configurations (or a subset thereof) that can be achieved make up a configuration space (or state space). This can be as simple as an unbounded 2d plane for a point in the plane, or incredibly complex combinations of ranges of other parameters.

fitness function and Selection for a Genetic Algorithm

I'm trying to design a nonlinear fitness function where I maximize variable A and minimize the variable B. The issue is that maximizing A is much more important at single digit values, almost logarithmic. B needs to be minimized and in contrast to A, it becomes less important when small (less than one) and more important when it's larger (>1), so exponential decay.
The main goal is to optimize A, so I guess an analog is A=profits, B=costs
Should I aim to keep everything positive so that the I can use a roulette wheel selection, or would it be better to use a rank/torunament kind of system? The purpose of my algorithm is shape optimization.
Thanks
When considering a multi-objective problem the goal is usually to identify all solutions that lie on the Pareto curve - the Pareto optimal set. Have a look here for a 2-dimensional visual example. When the algorithm completes you want a set of solutions that are not dominated by any other solution. You therefore need to define a pareto ranking mechanism to take into account both objectives - for a more in depth explanation, as well as links to even more reading, go here
With this in mind, in order to effectively explore all solutions along the pareto front you do not want an implementation that encourages premature convergence, otherwise your algorithm will only explore the search space in one specific area of the Pareto curve. I would implement a selection operator that keeps all members of each iteration's optimal set of solutions, that is all solutions which are not dominated by another + plus a parameter controlled percentage of other solutions. This way you encourage exploration all along the Pareto curve.
You also need to ensure your mutation and crossover operators are tuned correctly too. With any novel application of Evolutionary Algorithms, part of the problem is trying to identify an optimal parameter set for the problem domain... this is where it gets really interesting!!
The description is very vague, but assuming that you actually have an idea of what the function should look like and you're just wondering whether you need to modify it so that proportional selection can be used easily, then no. Regardless of fitness function, you should probably default to using something like tournament selection. Controlling selection pressure is one of the most important things you have to do in order to get consistently good results, and roulette wheel selection doesn't allow you that control. You typically get enormous pressure very early, which drives premature convergence. That might be preferable in a few cases, but it's not where I'd start my investigations.

Resources