What is argument "grad_outputs" in chainer's backward function - chainer

3 questions:
what is grad_outputs in chainer?
one example in chainer's function F.transpose, how to explain this backward code?
def backward(self, inputs, grad_outputs):
gy = grad_outputs[0]
inv_axes = self.axes
if self.axes:
axes = tuple(ax % len(self.axes) for ax in self.axes)
inv_axes = tuple(numpy.argsort(axes))
gx = gy.transpose(inv_axes)
return gx,
suppose I want implement self define function, but my inputs[0] and inputs[1] have different shape, in order to back propagation using differential chain rule, I have to write following code in backward:
a, b = inputs
gy = grad_outputs[0]
return a * gy, b * gy
But, a and b is not same shape, and a * gy and b * gy maybe report error? shape doesn't match to multiply?

*This answer applies to chainer v2, the Function class's internal behavior may change after chainer v3 to support differentiable backpropagation.
Back propagation proceeds from final layer to first layer to propagate its gradients in order to calculate gradient for each layer's parameters.
The function's backward function receives gradient of output, and need to calculate & return gradient of input.
grad_outputs is the gradient for this function's output, in array (numpy or cupy) form.
I believe the basic idea is, F.transpose's differentiation is also just a transpose, so it is just returning the transpose of gradient of output, gy.
However rigorously, F.transpose's transpose order is specified when we forward the computation, this order is kept as self.axes and in it needs to be reverse ordered in backward computation. I guess inv_axes is the reversely ordered axes and it is used to calculate gradient of input, written as gx.
As you wrote, you can return gradient of inputs in tuple format like return a * gy, b * gy. Shape does not matter and it can be different for each function's input (as well as the return values of backward)

Related

3D Projection Modification - Encode Z/W into Z

This is a little tricky to explain, so bare with me. I'm attempting to design a 2D projection matrix that takes 2D pixel coordinates along with a custom world-space depth value, and converts to clip-space.
The idea is that it would allow drawing elements based on screen coordinates, but at specific depths, so that these elements would interact on the depth buffer with normal 3D elements. However, I want x and y coordinates to remain the same scale at every depth. I only want depth to influence the depth buffer, and not coordinates or scale.
After the vertex shader, the GPU sets depth_buffer=z/w. However, it also scales x/w and y/w, which creates the depth scaling I want to avoid. This means I must make sure my final clip-space w coordinate ends up being 1.0, to avoid those things. I think I could also adopt to scale x and y by w, to cancel out the divide, but I would rather do the former, if possible.
This is the process that my 3D projection matrix uses to convert depth into clip space (d = depth, n = near distance, f = far distance)
z = f/(f-n) * d + f/(f-n) * -n;
w = d;
This is how I would like to setup my 2D projection matrix. Compared to the 3D version, it would divide both attributes by the input depth. This would simulate having z/w encoded into just the z value.
z = ( f/(f-n) * d + f/(f-n) * -n ) / d;
w = d / d;
I think this turns into something like..
r = f/(f-n); // for less crazy math
z = r + ( r * -n ) / d;
w = 1.0;
However, I can't seem to wrap my math around the values that I would need to plug into my matrix to get this result. It looks like I would need to set my matrix up to perform a division by depth. Is that even possible? Can anyone help me figure out the values I need to plug into my matrix at m[2][2] and m[3][2] (m._33 and m._43) to make something like this happen?
Note my 3D projection matrix uses the following properties to generate the final z value:
m._33 = f / (f-n); // depth scale
m._43 = -(f / (f-n)) * n; // depth offset
Edit: After thinking about this a little more, I realized that the rate of change of the depth buffer is not linear, and I'm pretty sure a matrix can only perform linear change when its input is linear. If that is the case, then what I'm trying to do wouldn't be possible. However, I'm still open to any ideas that are in the same ball park, if anyone has one. I know that I can get what I want by simply doing pos.z /= pos.w; pos.w = 1; in the vertex shader, but I was really hoping to make it all happen in the projection matrix, if possible.
In case anyone is attempting to do this, it cannot be done. Without black magic, there is apparently no way to divide values with a matrix, unless of course the diviser is a constant or etc, where you can swap out a scaler with 1/x. I resorted to performing the operation in the shader in the end.

Simulating interpolation on Brownian motion with Julia

I'm trying to interpolate a Brownian motion. The function does not return me an error but it seems like Julia does not put the value on vector B. Here the codes.
function interpolation(i,j,N,BM)
if j-i>1
k = sqrt((j-i)/((2^N))/4)
d = (i+j)/2
BM[d] =((BM[i]+BM[j])/2)+k*randn(1)
BM = interpolation(i,d,N,BM)
BM = interpolation(d,j,N,BM)
end
end
plot(BM)
Thanks a lot!
I think that your code could be simplified by using array views. That eliminates all of the extra parameters from you code and makes it easier to see what it is doing. The normalization so that changes are smaller for interior steps could be simplified as well.
So here is a stab at this simplification:
function fractal(x)
if length(x) > 2
n = length(x)
mid = (n+1)÷2
x[mid] = (x[1] + x[n])/2 + randn() * sqrt(n)
fractal(#view x[1:mid])
fractal(#view x[mid:n])
end
end
And here is a result of this code running:
a = zeros(1024)
fractal(a)
plot(a, legend=false)
The point of the simplification is to highlight the idea that the algorithm involves:
Interpolating the middle value based on the end-points
Do the same to the left and right halves of the array
if we don't have a big enough array, just return
This approach avoids complicating the picture with all of the housekeeping and it worked first try, largely because, I think, I didn't have to keep all that stuff straight.

Differentiating a scalar with respect to matrix

I have a scalar function which is obtained by iterative calculations. I wish to differentiate(find the directional derivative) of the values with respect to a matrix elementwise. How should I employ the finite difference approximation in this case. Does diff or gradient help in this case. Note that I only want numerical derivatives.
The typical code that I would work on is:
n=4;
for i=1:n
for x(i)=-2:0.04:4;
for y(i)=-2:0.04:4;
A(:,:,i)=[sin(x(i)), cos(y(i));2sin(x(i)),sin(x(i)+y(i)).^2];
B(:,:,i)=[sin(x(i)), cos(x(i));3sin(y(i)),cos(x(i))];
R(:,:,i)=horzcat(A(:,:,i),B(:,:,i));
L(i)=det(B(:,:,i)'*A(:,:,i)B)(:,:,i));
%how to find gradient of L with respect to x(i), y(i)
grad_L=tr((diff(L)/diff(R)')*(gradient(R))
endfor;
endfor;
endfor;
I know that the last part for grad_L would syntax error saying the dimensions don't match. How do I proceed to solve this. Note that gradient or directional derivative of a scalar functionf of a matrix variable X is given by nabla(f)=trace((partial f/patial(x_{ij})*X_dot where x_{ij} denotes elements of matrix and X_dot denotes gradient of the matrix X
Both your code and explanation are very confusing. You're using an iteration of n = 4, but you don't do anything with your inputs or outputs, and you overwrite everything. So I will ignore the n aspect for now since you don't seem to be making any use of it. Furthermore you have many syntactical mistakes which look more like maths or pseudocode, rather than any attempt to write valid Matlab / Octave.
But, essentially, you seem to be asking, "I have a function which for each (x,y) coordinate on a 2D grid, it calculates a scalar output L(x,y)", where the calculation leading to L involves multiplying two matrices and then getting their determinant. Here's how to produce such an array L:
X = -2 : 0.04 : 4;
Y = -2 : 0.04 : 4;
X_indices = 1 : length(X);
Y_indices = 1 : length(Y);
for Ind_x = X_indices
for Ind_y = Y_indices
x = X(Ind_x); y = Y(Ind_y);
A = [sin(x), cos(y); 2 * sin(x), sin(x+y)^2];
B = [sin(x), cos(x); 3 * sin(y), cos(x) ];
L(Ind_x, Ind_y) = det (B.' * A * B);
end
end
You then want to obtain the gradient of L, which, of course, is a vector output. Now, to obtain this, ignoring the maths you mentioned for a second, if you're basically trying to use the gradient function correctly, then you just use it directly onto L, and specify the grid X Y used for it to specify the spacings between the different elements in L, and collect its output as a two-element array, so that you capture both the x and y vector-components of the gradient:
[gLx, gLy] = gradient(L, X, Y);

Simulate a bouncing ball?

Is it possible to create a simple model of a bouncing ball, using Julia's equation solvers?
I started with this:
using ODE
function bb(t, f)
(y, v) = f
dy_dt = v
dv_dt = -9.81
[dy_dt, dv_dt]
end
const y0 = 50.0 # height
const v0 = 0.0 # velocity
const startpos = [y0; v0]
ts = 0.0:0.25:10 # time span
t, res = ode45(bb, startpos, ts)
which produces useful-looking numbers:
julia> t
44-element Array{Float64,1}:
0.0
0.0551392
0.25
0.5
0.75
1.0
⋮
8.75
9.0
9.25
9.5
9.75
10.0
julia> res
44-element Array{Array{Float64,1},1}:
[50.0,0.0]
[49.9851,-0.540915]
[49.6934,-2.4525]
[48.7738,-4.905]
[47.2409,-7.3575]
⋮
[-392.676,-93.195]
[-416.282,-95.6475]
[-440.5,-98.1]
But somehow it needs to intervene when the height is 0, and reverse the velocity. Or am I on the wrong track?
DifferentialEquations.jl offers sophisticated callbacks and event handling. Since the DifferentialEquations.jl algorithms are about 10x faster while offering a higher order interpolation, these algorithms are clearly the better choose here anyways.
The first link is the documentation which shows how to do the event handling. The easy interface uses the macros. I start by defining the function.
f = #ode_def BallBounce begin
dy = v
dv = -g
end g=9.81
Here I am showing ParameterizedFunctions.jl to make the syntax nicer, but you can define the function directly as an in-place update f(t,u,du) (like Sundials.jl). Next you define the function which determines when an event takes place. It can be any function which is positive and hits zero at the event time. Here, we are checking for when the ball hits the ground, or for when y=0, so:
function event_f(t,u) # Event when event_f(t,u,k) == 0
u[1]
end
Next you say what to do when the event occurs. Here we want to reverse the sign of the velocity:
function apply_event!(u,cache)
u[2] = -u[2]
end
You put these functions together to build the callback using the macros:
callback = #ode_callback begin
#ode_event event_f apply_event!
end
Now you solve as usual. You define the ODEProblem using f and the initial condition, and you call solve on a timespan. The only thing extra is you pass the callback along with the solver:
u0 = [50.0,0.0]
prob = ODEProblem(f,u0)
tspan = [0;15]
sol = solve(prob,tspan,callback=callback)
Then we can use the plot recipe to automatically plot the solution:
plot(sol)
The result is this:
A few things to notice here:
DifferentialEquations.jl will automatically use an interpolation to more safely check for the event. For example, if the event happened within a timestep but not at the ends, DifferentialEquations.jl will still find it. More or less interpolations points can be included as options to the #ode_event macro.
DifferentialEquations.jl used a rootfinding method to hone in on the moment of the event. Even though the adaptive solver steps past the event, by using rootfinding on the interpolation it finds the exact time of the event, and thus gets the discontinuity right. You can see that in the graph since the ball never goes negative.
There is a whole lot more this can do. Check out the docs. You can do pretty much anything with this. For example, have your ODE changing size over the run to model a population of cells with birth and deaths. This is something other solver packages can't do.
Even with all of these features, speed is not compromised.
Let me know if you need any extra functionality added to the "ease of use" interface macros.
Somewhat hacky:
function bb(t, f)
(y, v) = f
dy_dt = v
dv_dt = -9.81*sign(y)
[dy_dt, dv_dt]
end
where you just follow a convention where y and -y refer to the same heights. You can then plot the trajectory of the bouncing ball by just plotting abs(y).

What's a simple way of warping an image with a given set of points?

I'd like to implement image morphing, for which I need to be able to deform the image with given set of points and their destination positions (where they will be "dragged"). I am looking for a simple and easy solution that gets the job done, it doesn't have to look great or be extremely fast.
This is an example what I need:
Let's say I have an image and a set of only one deforming point [0.5,0.5] which will have its destination at [0.6,0.5] (or we can say its movement vector is [0.1,0.0]). This means I want to move the very center pixel of the image by 0.1 to the right. Neighboring pixels in some given radius r need to of course be "dragged along" a little with this pixel.
My idea was to do it like this:
I'll make a function mapping the source image positions to destination positions depending on the deformation point set provided.
I will then have to find the inverse function of this function, because I have to perform the transformation by going through destination pixels and seeing "where the point had to come from to come to this position".
My function from step 1 looked like this:
p2 = p1 + ( 1 / ( (distance(p1,p0) / r)^2 + 1 ) ) * s
where
p0 ([x,y] vector) is the deformation point position.
p1 ([x,y] vector) is any given point in the source image.
p2 ([x,y] vector) is the position, to where p1 will be moved.
s ([x,y] vector) is movement vector of deformation point and says in which direction and how far p0 will be dragged.
r (scalar) is the radius, just some number.
I have problem with step number 2. The calculation of the inverse function seems a little too complex to me and so I wonder:
If there is an easy solution for finding the inverse function, or
if there is a better function for which finding the inverse function is simple, or
if there is an entirely different way of doing all this that is simple?
Here's the solution in Python - I did what Yves Daoust recommended and simply tried to use the forward function as the inverse function (switching the source and destination). I also altered the function slightly, changing exponents and other values produces different results. Here's the code:
from PIL import Image
import math
def vector_length(vector):
return math.sqrt(vector[0] ** 2 + vector[1] ** 2)
def points_distance(point1, point2):
return vector_length((point1[0] - point2[0],point1[1] - point2[1]))
def clamp(value, minimum, maximum):
return max(min(value,maximum),minimum)
## Warps an image accoording to given points and shift vectors.
#
# #param image input image
# #param points list of (x, y, dx, dy) tuples
# #return warped image
def warp(image, points):
result = img = Image.new("RGB",image.size,"black")
image_pixels = image.load()
result_pixels = result.load()
for y in range(image.size[1]):
for x in range(image.size[0]):
offset = [0,0]
for point in points:
point_position = (point[0] + point[2],point[1] + point[3])
shift_vector = (point[2],point[3])
helper = 1.0 / (3 * (points_distance((x,y),point_position) / vector_length(shift_vector)) ** 4 + 1)
offset[0] -= helper * shift_vector[0]
offset[1] -= helper * shift_vector[1]
coords = (clamp(x + int(offset[0]),0,image.size[0] - 1),clamp(y + int(offset[1]),0,image.size[1] - 1))
result_pixels[x,y] = image_pixels[coords[0],coords[1]]
return result
image = Image.open("test.png")
image = warp(image,[(210,296,100,0), (101,97,-30,-10), (77,473,50,-100)])
image.save("output.png","PNG")
You don't need to construct the direct function and invert it. Directly compute the inverse function, by swapping the roles of the source and destination points.
You need some form of bivariate interpolation, have a look at radial basis function interpolation. It requires to solve a linear system of equations.
Inverse distance weighting (similar to your proposal) is the easiest to implement but I am afraid it will give disappointing results.
https://en.wikipedia.org/wiki/Multivariate_interpolation#Irregular_grid_.28scattered_data.29

Resources