I an trying to implement the gradient descent algorithm defined in https://arxiv.org/pdf/1808.03856.pdf.
The gradient could be shown like this:
How could I do it?
Related
Question :
I'm looking for a 2D noise whose gradient always have a norm of 1. which is equivalent to say that its isolines are always at the same distance. It can be any type of noise but its gradient must be continuous ( and if possible the second derivative too ). My goal is to implement it as a function in a fragment shader but just having the mathematical principle would be enough.
To explain more graphicaly what I want, here is a classic gradient noise with isolines and a simple lightning :
As you can see, the isolines density is variable because the slope isn't constant.
On this second picture, you can see the exact same noise but with a different lightning that I made by normalizing the gradient of the first one :
this looks way more like what I'm looking for, however, as you can see, the isolines are still wrong. I just cheated to get the lightning I wanted but I still don't have the noise itself.
Ways of thought :
During my research, I tried to do something similar to gradient noise ( the gradient is defined by a random vector at each grid point ), I focused on a square grid noise but a simplex grid would work too. I came across two main potential ways to solve the problem :
Finding the gradient of the noise first:
It is possible to find a function with its gradient and a fixed value, the reason why it doesn't work with the normalized gradient I used for the lighting of the second picture is that the rotational of the gradient must be 0 everywhere ( else the function can't be continuous ). So the gradient I'm looking for must have a rotational of 0, have a norm of 1, and if we integrate it from one node to another, the result must be zero ( because all nodes of a gradient noise have a value of 0 ).
norm of 1 :
I found three ways to deal with this problem : we can define the gradient by (cos(a(x, y)), sin(a(x, y))), say that the dot product between the gradient and its derivative is 0 or simply say that the dot product of he gradient with itself is 1.
rotational :
The derivative of the x component of the gradient in respect of y must be equal to the derivative of the y component of the gradient in respect of x ( which with the trigonometric technique seen above, becomes : cos(a)*da/dx = -sin(a)*da/dy )
integral from a node to the next one :
I havent investigated that part yet.
Finding the noise itself:
it solves the nodes = 0 problem easily but the main one is still there : the norm of the gradient must be 1 everywhere.
Conclusion :
Of course, those are just ideas and if your answer is completely different from that, ill take it anyways ( and with a big smile ).
I am implementing a very complex Function in my research, It use Belief Propagation in this layer. I have derived the gradient w.r.t. W(parameter) of this layer, But because its complex, I haven't derived the gradient w.r.t. input_data(the data come from former layer).
I am very confusion about the detail of back propagation. I search a lot about BP algorithm, Some notes says it is ok only to differential w.r.t. W(parameter) and use residual to get gradient ? Your example seems we need also to calculate gradient w.r.t. input data(former layer output). I am confusion?
Very typical example is, how to derive gradient w.r.t. input image in convolutional layer?
My network has two layers, Do I need to derive gradient by hand w.r.t. input X in the last layer? (backward need to return gx in order to let BP works to gradient flow to former layer)?
If you do not need the gradient w.r.t. the input, you can omit its computation. In this case, return None as the placeholder for the omitted input gradient. Note that, in this case, the grad of the input after backprop will be incorrect. If you want to write a Function that can be used in any context (including the case that one wants the gradient w.r.t. the input), you have to compute the gradients w.r.t. all the inputs (except for the case that the Function is not differentiated w.r.t. the input). This is the reason why the built-in functions of Chainer compute gradients for all the inputs.
By the way, deriving the gradient w.r.t. the input image of a convolutional layer is simple: apply transposed-convolution (which is called "deconvolution" in Chainer for the historical reason) to the output using the same weight.
Is there a way to select step size to do gradient descent when you only have access to gradient evaluations, but not function evaluations?
I know the function to be optimized over is convex, and given a point x, I have access to f'(x), but not f(x). Can I do anything other than a fixed step size rule for gradient descent in this case?
I am looking for some algorithms to rasterize the linear gradient definition, i.e. converting the linear gradient in to pixel colors in RGB color space. I have already seen the algorithm used by PS/PDF. But, I am more interested on web technologies.
Could someone please describe or provide some reference on how browsers(or webkit) typically do it while rendering SVG/CSS?
I was working on a method to approximate the normal to a surface of a 3d voxel image.
The method suggested in this article (only algorithm I found via Google) seems to work. The suggested method from the paper is to find the direction the surface varies the most in, choose 2 points on the tangent plane using some procedure, and then take the cross product. Some Pascal code by the article author code, commented in Portuguese, implements this method.
However, using the gradient of f (use each partial derivative as a component of the vector) as the normal seems to work pretty well; I tested this along several circles on a voxellated sphere and I got results that look correct in most spots (there are a few outliers that are off by about 30 degrees). This is very different from the method used in the paper, but it still works. What I don't understand is why the gradient of f = 1/dist calculated along the surface of an object should produce the normal.
Why does this procedure work? Is it just the fact that the sphere test was too much of a special case? Could you suggest a simpler method, or explain any of these methods?
Using the gradient of the volume as a normal for lighting is a standard technique in volume rendering.
If you interpret the value of a voxel as the opacity, the gradient will give you the direction of the greatest change in the opacity, which is similar to a surface normal.