When to use individual optimizers in PyTorch? - torch

The example given here uses two optimizers for encoder and decoder individually. Why? And when to do like that?

If you have multiple networks (in the sense of multiple objects that inherit from nn.Module), you have to do this for a simple reason: When construction a torch.nn.optim.Optimizer object, it takes the parameters which should be optimized as an argument. In your case:
encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate)
This also gives you the freedom to vary parameters as the learning rate independently. If you don't need that, you could create a new class inheriting from nn.Module and containing both networks, encoder and decoder or create a set of parameters to give to the optimizer as explained here:
nets = [encoder, decoder]
parameters = set()
for net in nets:
parameters |= set(net.parameters())
where | is the union operator for sets in this context.

Related

Julia image feature extraction using EfficientNet.jl

I am trying to use Efficientnet.jl as a feature extractor, meaning I want to extract all features after a given block in the flux chain.
There is the build in function
features = model(x, Val(:stages))
which returns all features after each block, which is very memory inefficient, since I only need to store values after exactly 3 blocks.
My thought was to only use a subset of the layers this way:
transparent_model = model.blocks[1:model.stages[3]]
features = transparent_model(x)
Unfortunately I get the following Error:
DimensionMismatch("Input channels must match! (3 vs. 1)")
which is in my opinion just due to a bad error message.
size(x) -> (1280,720,3,1)

OpenMDAO 1.x relevance reduction

I have a component in OpenMDAO without outputs that serves to provide inputs to the rest of the group. apply_linear in that component is being called despite the fact that the output of it is not connected. Shouldn't the relevance reduction algorithm in OpenMDAO 1.x figure out that apply_linear for this method never needs to be called?
As it turns out, relevance reduction on a per-variable basis isn't turned on by default. You can turn it on with:
prob.root.ln_solver = LinearGaussSeidel()
prob.root.ln_solver.options['single_voi_relevance_reduction'] = True
This options is set to False by default because it does use more memory by allocating separate vectors for each quantity of interest (though each vector is smaller because it only contains relevant variables, but the total size may be larger.) Also, relevance-reduction is only applicable when using Linear Gauss Seidel as the top linear solver.
My reputation isn't high enough yet to leave comments, so I'm just adding another answer instead. I just wanted to mention that if you're not running under MPI, activating single_voi_relevance_reduction is essentially free. The real increase in memory use isn't due to the vectors themselves, but instead it's due to the index arrays that we store in order to transfer the data from source arrays to target arrays. We're forced to use index arrays under MPI, because PETSc requires it, but when we're not using MPI we use python slice objects to do our data transfer. Slice objects require very little memory.

Cheat sheet for caffe / pycaffe?

Does anyone know whether there is a cheat sheet for all important pycaffe commands?
I was so far using caffe only via Matlab interface and terminal + bash scripts.
I wanted to shift towards using ipython and work through the ipython notebook examples. However I find it hard to get an overview of all the functions that are inside the caffe module for python. (I'm also quite new to python).
The pycaffe tests and this file are the main gateway to the python coding interface.
First of all, you would like to choose whether to use Caffe with CPU or GPU. It is sufficient to call caffe.set_mode_cpu() or caffe.set_mode_gpu(), respectively.
Net
The main class that the pycaffe interface exposes is the Net. It has two constructors:
net = caffe.Net('/path/prototxt/descriptor/file', caffe.TRAIN)
which simply create a Net (in this case using the Data Layer specified for training), or
net = caffe.Net('/path/prototxt/descriptor/file', '/path/caffemodel/weights/file', caffe.TEST)
which creates a Net and automatically loads the weights as saved in the provided caffemodel file - in this case using the Data Layer specified for testing.
A Net object has several attributes and methods. They can be found here. I will cite just the ones I use more often.
You can access the network blobs by means of Net.blobs. E.g.
data = net.blobs['data'].data
net.blobs['data'].data[...] = my_image
fc7_activations = net.blobs['fc7'].data
You can access the parameters (weights) too, in a similar way. E.g.
nice_edge_detectors = net.params['conv1'].data
higher_level_filter = net.params['fc7'].data
Ok, now it's time to actually feed the net with some data. So, you will use backward() and forward() methods. So, if you want to classify a single image
net.blobs['data'].data[...] = my_image
net.forward() # equivalent to net.forward_all()
softmax_probabilities = net.blobs['prob'].data
The backward() method is equivalent, if one is interested in computing gradients.
You can save the net weights to subsequently reuse them. It's just a matter of
net.save('/path/to/new/caffemodel/file')
Solver
The other core component exposed by pycaffe is the Solver. There are several types of solver, but I'm going to use only SGDSolver for the sake of clarity. It is needed in order to train a caffe model.
You can instantiate the solver with
solver = caffe.SGDSolver('/path/to/solver/prototxt/file')
The Solver will encapsulate the network you are training and, if present, the network used for testing. Note that they are usually the same network, only with a different Data Layer. The networks are accessible with
training_net = solver.net
test_net = solver.test_nets[0] # more than one test net is supported
Then, you can perform a solver iteration, that is, a forward/backward pass with weight update, typing just
solver.step(1)
or run the solver until the last iteration, with
solver.solve()
Other features
Note that pycaffe allows you to do more stuff, such as specifying the network architecture through a Python class or creating a new Layer type.
These features are less often used, but they are pretty easy to understand by reading the test cases.
Please note that the answer by Flavio Ferrara has a litte problem which may cause you waste a lot of time:
net.blobs['data'].data[...] = my_image
net.forward()
The code above is noneffective if your first layer is a Data type layer, because when net.forward() is called, it will begin from the first layer, and then your inserted data my_image will be covered. So it will show no error but give you totally irrelevant output. The correct way is to assign the start and end layer, for example:
net.forward(start='conv1', end='fc')
Here is a Github repository of Face Verification Experiment on LFW Dataset, using pycaffe and some matlab code. I guess it could help a lot, especially the caffe_ftr.py file.
https://github.com/AlfredXiangWu/face_verification_experiment
Besides, here are some short example code of using pycaffe for image classification:
http://codrspace.com/Jaleyhd/caffe-python-tutorial/
http://prog3.com/sbdm/blog/u011762313/article/details/48342495

How to get a function from a symbol without using eval?

I've got a symbol that represents the name of a function to be called:
julia> func_sym = :tanh
I can use that symbol to get the tanh function and call it using:
julia> eval(func_sym)(2)
0.9640275800758169
But I'd rather avoid the 'eval' there as it will be called many times and it's expensive (and func_sym can have several different values depending on context).
IIRC in Ruby you can say something like:
obj.send(func_sym, args)
Is there something similar in Julia?
EDIT: some more details on why I have functions represented by symbols:
I have a type (from a neural network) that includes the activation function, originally I included it as a funcion:
type NeuralLayer
weights::Matrix{Float32}
biases::Vector{Float32}
a_func::Function
end
However, I needed to serialize these things to files using JLD, but it's not possible to serialize a Function, so I went with a symbol:
type NeuralLayer
weights::Matrix{Float32}
biases::Vector{Float32}
a_func::Symbol
end
And currently I use the eval approach above to call the activation function. There are collections of NeuralLayers and each can have it's own activation function.
#Isaiah's answer is spot-on; perhaps even more-so after the edit to the original question. To elaborate and make this more specific to your case: I'd change your NeuralLayer type to be parametric:
type NeuralLayer{func_type}
weights::Matrix{Float32}
biases::Vector{Float32}
end
Since func_type doesn't appear in the types of the fields, the constructor will require you to explicitly specify it: layer = NeuralLayer{:excitatory}(w, b). One restriction here is that you cannot modify a type parameter.
Now, func_type could be a symbol (like you're doing now) or it could be a more functionally relevant parameter (or parameters) that tunes your activation function. Then you define your activation functions like this:
# If you define your NeuralLayer with just one parameter:
activation(layer::NeuralLayer{:inhibitory}) = …
activation(layer::NeuralLayer{:excitatory}) = …
# Or if you want to use several physiological parameters instead:
activation{g_K,g_Na,g_l}(layer::NeuralLayer{g_K,g_Na,g_l} = f(g_K, g_Na, g_l)
The key point is that functions and behavior are external to the data. Use type definitions and abstract type hierarchies to define behavior, as is coded in the external functions… but only store data itself in the types. This is dramatically different from Python or other strongly object-oriented paradigms, and it takes some getting used to.
But I'd rather avoid the 'eval' there as it will be called many times and it's expensive (and func_sym can have several different values depending on context).
This sort of dynamic dispatch is possible in Julia, but not recommended. Changing the value of 'func_sym' based on context defeats type inference as well as method specialization and inlining. Instead, the recommended approach is to use multiple dispatch, as detailed in the Methods section of the manual.

Trouble implementing a very simple mass flow source

I am currently learning Modelica by trying some very simple examples. I have defined a connector Incompressible for an incompressible fluid like this:
connector Incompressible
flow Modelica.SIunits.VolumeFlowRate V_dot;
Modelica.SIunits.SpecificEnthalpy h;
Modelica.SIunits.Pressure p;
end Incompressible;
I now wish to define a mass or volume flow source:
model Source_incompressible
parameter Modelica.SIunits.VolumeFlowRate V_dot;
parameter Modelica.SIunits.Temperature T;
parameter Modelica.SIunits.Pressure p;
Incompressible outlet;
equation
outlet.V_dot = V_dot;
outlet.h = enthalpyWaterIncompressible(T); // quick'n'dirty enthalpy function
outlet.p = p;
end Source_incompressible;
However, when checking Source_incompressible, I get this:
The problem is structurally singular for the element type Real.
The number of scalar Real unknown elements are 3.
The number of scalar Real equation elements are 4.
I am at a loss here. Clearly, there are three equations in the model - where does the fourth equation come from?
Thanks a lot for any insight.
Dominic,
There are a couple of issues going on here. As Martin points out, the connector is unbalanced (you don't have matching "through" and "across" pairs in that connector). For fluid systems, this is acceptable. However, intensive fluid properties (e.g., enthalpy) have to be marked as so-called "stream" variables.
This topic is, admittedly, pretty complicated. I'm planning on adding an advanced chapter to my online Modelica book on this topic but I haven't had the time yet. In the meantime, I would suggest you have a look at the Modelica.Fluid library and/or this presentation by one of its authors, Francesco Casella.
That connector is not a physical connector. You need one flow variable for each potential variable. This is the OpenModelica error message if it helps a little:
Warning: Connector .Incompressible is not balanced: The number of potential variables (2) is not equal to the number of flow variables (1).
Error: Too many equations, over-determined system. The model has 4 equation(s) and 3 variable(s).
Error: Internal error Found Equation without time dependent variables outlet.V_dot = V_dot
This is because the unconnected connector will generate one equation for the flow:
outlet.V_dot = 0.0;
This means outlet.V_dot is replaced in:
outlet.V_dot = V_dot;
And you get:
0.0 = V_dot;
But V_dot is a parameter and can not be assigned to in an equation section (needs an initial equation if the parameter has fixed=false, or a binding equation in the default case).

Resources