Why in chainer define self define function backward function must return gradient shape the same as input? - chainer

Two questions:
https://docs.chainer.org/en/stable/tutorial/function.html written: the backward function must return the same shape as the arguments of the forward method? because in some cases, the input data and parameter need not to be same shape or length, such as Convolutional2D, How to deal with input data and parameter have different shape.
because in some cases, such as maxpooling, there is no gradient here. How to define such chainer function?

The backward method should return a tuple of arrays, and the i-th array of the tuple should have the same shape as the i-th argument of the forward method. Of course, the different arguments of forward (and thus different return values of backward) can have different shapes.
When a function does not have gradient w.r.t. some of the inputs (i.e., the gradient is always zero), you can return None as the corresponding elements of the tuple instead of zero-filled arrays. By the way, max pooling does have gradients.

Related

Is there a particular use case for fold() function

When accumulating a collection (just collection, not list) of values into a single value, there are two options.
reduce(). Which takes a List<T>, and a function (T, T) -> T, and applies that function iteratively until the whole list is reduced into a single value.
fold(). Which takes a List<T>, an initial value V, and a function (V, T) -> V, and applies that function iteratively until the whole list is folded into a single value.
I know that both of them have their own use cases. For eg, reduce() can be used to find maximum value in a list and fold() can be used to find sum of all values in a list.
But, in that example, instead of using fold(), you can add(0), and then reduce(). Another use case of fold is to join all elements into a string. But this can also be done without using fold, by map |> toString() followed by reduce().
Just out of curiosity, the question is, can every use case of fold() be avoided given functions map(), filter(), reduce() and add()? (also remove() if required.)
It's the other way around. reduce(L,f) = fold(first(L), rest(L), f), so there's no special need for reduce -- it's just a short form for a common fold pattern.
fold has lots of use cases of its own, though.
The example you gave for string concatenation is one of them -- you can fold items into a special string accumulator much more efficiently than you can build strings by incremental accumulation. (exactly how depends on the language, but it's true pretty much everywhere).
Applying a list of incremental changes to a target object is a pretty common pattern. Adding files to a folder, drawing shapes on a canvas, turning a list into a set, crossing off completed items in a to-do list, etc., are all examples of this pattern.
Also map(L,f) = fold(newMap(), L, (M,v) -> add(M,f(v)), so map is also just a common fold pattern. Similarly, filter(L,f) = fold(newList(), L, (L,v) -> f(v) ? add(L,v) : L).

Functions as first class objects in SageMath

How in SageMath to represent a function from a set X to a set Y?
I want to enumerate all functions from a set X to a set Y. How to represent the values of the iterator which enumerates them?
I know I can just make a Python hash, but maybe there is already a more suitable object in Sage? (which would be internally represented as a hash but work as a function when applied to an argument)

Function doesn't change value (R)

I have written a function that takes two arguments, a number between 0:16 and a vector which contains four parameter values.
The output of the function does change if I change the parameters in the vector, but it does not change if I change the number between 0:16.
I can add, that the function I'm having troubles with, includes another function (called 'pi') which takes the same arguments.
I have checked that the 'pi' function does actually change values if I change the value from 0:16 (and it does also change if I change the values of the parameters).
Firstly, here is my code;
pterm_ny <- function(x, theta){
(1-sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*theta[4])/pi(x, theta)
}
pi <- function(x, theta){
theta[1]*1*(x==0)+theta[2]*(theta[3]^(x))*exp((-1)*(theta[3]))+(1-
sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*(theta[4]))
}
Which returns 0.75 for pterm_ny(i,c(0.2,0.2,2,2)), were i = 1,...,16 and 0.2634 for i = 0, which tells me that the indicator function part in 'pi' does work.
With respect to raising a number to a certain power, I have been told that one should wrap the wished number in a 'I', as an example it would be like;
x^I(2)
I have tried to do that in my code, but that didn't help either.
I can't remember the argument for doing it, but I expect that it's to ensure that the number in parentheses is interpreted as an integer.
My end goal is to get 17 different values of the 'pterm' and to accomplish that, I was thinking of using the sapply function like this;
sapply(c(0:16),pterm_ny,theta = c(0.2,0.2,2,2))
I really hope that someone can point out what I'm missing here.
In advance, thank you!
You have a theta[4]^x term both in your main expression and in your pi() function; these are cancelling out, leaving the result invariant to changes in x ...
Also:
you might want to avoid using pi as your function name, as it's also a built-in variable (3.14159...) - this can sometimes cause confusion
the advice about using the "as is" function I() to protect powers is only relevant within formulas, e.g. as used in lm() (linear regression). (It would be used as I(x^2), not x^I(2)

How to initialize a convolution layer with an arbitrary kernel in Keras?

I want to initialize the convolution layer by a specific kernel which is not defined in Keras. For instance, if I define the below function to initialize the kernel:
def init_f(shape):
ker=np.zeros((shape,shape))
ker[int(np.floor(shape/2)),int(np.floor(shape/2))]=1
return ker
And the convolution layer is designed as follows:
model.add(Conv2D(filters=32, kernel_size=(3,3),
kernel_initializer=init_f(3)))
I get the error:
Could not interpret initializer identifier
I have followed a similar issue at:
https://groups.google.com/forum/#!topic/keras-users/J46pplO64-8
But I could not adapt it to my code.
Could you please help me to define the arbitrary kernel in Keras?
A few items to fix. Let's start with the kernel initializer. From the documentation:
If passing a custom callable, then it must take the argument shape (shape of the variable to initialize) and dtype (dtype of generated values)
So the signature should become:
def init_f(shape, dtype=None)
The function will work without the dtype, but it's good practice to keep it there. That way you can specify the dtype to calls inside your function, e.g.:
np.zeros(shape, dtype=dtype)
This also addresses your second issue: the shape argument is a tuple, so you just need to pass it straight to np.zeros and don't need to make another tuple.
I'm guessing you're trying to initialize the kernel with a 1 in the middle, so you could also generalize your function to work with whatever shape it receives:
ker[tuple(map(lambda x: int(np.floor(x/2)), ker.shape))]=1
Putting it all together:
def init_f(shape, dtype=None):
ker = np.zeros(shape, dtype=dtype)
ker[tuple(map(lambda x: int(np.floor(x/2)), ker.shape))]=1
return ker
One last problem. You need to pass the function to the layer, not the result of the call:
model.add(Conv2D(filters=32, kernel_size=(3,3),
kernel_initializer=init_f))
The layer function will pass the arguments to init_f.

New to OCaml: How would I go about implementing Gaussian Elimination?

I'm new to OCaml, and I'd like to implement Gaussian Elimination as an exercise. I can easily do it with a stateful algorithm, meaning keep a matrix in memory and recursively operating on it by passing around a reference to it.
This statefulness, however, smacks of imperative programming. I know there are capabilities in OCaml to do this, but I'd like to ask if there is some clever functional way I haven't thought of first.
OCaml arrays are mutable, and it's hard to avoid treating them just like arrays in an imperative language.
Haskell has immutable arrays, but from my (limited) experience with Haskell, you end up switching to monadic, mutable arrays in most cases. Immutable arrays are probably amazing for certain specific purposes. I've always imagined you could write a beautiful implementation of dynamic programming in Haskell, where the dependencies among array entries are defined entirely by the expressions in them. The key is that you really only need to specify the contents of each array entry one time. I don't think Gaussian elimination follows this pattern, and so it seems it might not be a good fit for immutable arrays. It would be interesting to see how it works out, however.
You can use a Map to emulate a matrix. The key would be a pair of integers referencing the row and column. You'll want to use your own get x y function to ensure x < n and y < n though, instead of accessing the Map directly. (edit) You can use the compare function in Pervasives directly.
module OrderedPairs = struct
type t = int * int
let compare = Pervasives.compare
end
module Pairs = Map.Make (OrderedPairs)
let get_ n set x y =
assert( x < n && y < n );
Pairs.find (x,y) set
let set_ n set x y v =
assert( x < n && y < n );
Pairs.add (x,y) set v
Actually, having a general set of functions (get x y and set x y at a minimum), without specifying the implementation, would be an even better option. The functions then can be passed to the function, or be implemented in a module through a functor (a better solution, but having a set of functions just doing what you need would be a first step since you're new to OCaml). In this way you can use a Map, Array, Hashtbl, or a set of functions to access a file on the hard-drive to implement the matrix if you wanted. This is the really important aspect of functional programming; that you trust the interface over exploiting the side-effects, and not worry about the underlying implementation --since it's presumed to be pure.
The answers so far are using/emulating mutable data-types, but what does a functional approach look like?
To see, let's decompose the problem into some functional components:
Gaussian elimination involves a sequence of row operations, so it is useful first to define a function taking 2 rows and scaling factors, and returning the resultant row operation result.
The row operations we want should eliminate a variable (column) from a particular row, so lets define a function which takes a pair of rows and a column index and uses the previously defined row operation to return the modified row with that column entry zero.
Then we define two functions, one to convert a matrix into triangular form, and another to back-substitute a triangular matrix to the diagonal form (using the previously defined functions) by eliminating each column in turn. We could iterate or recurse over the columns, and the matrix could be defined as a list, vector or array of lists, vectors or arrays. The input is not changed, but a modified matrix is returned, so we can finally do:
let out_matrix = to_diagonal (to_triangular in_matrix);
What makes it functional is not whether the data-types (array or list) are mutable, but how they they are used. This approach may not be particularly 'clever' or be the most efficient way to do Gaussian eliminations in OCaml, but using pure functions lets you express the algorithm cleanly.

Resources