I am setting up an optimization routine with three components using SLSQP, external code, explicit comp and FD differentiation.
Comp1 : IndepVarComp with output ='x'
Comp2 : External Code input ='x' / output = 'f','g' - derivatives
='fd'
Comp3 : Explicit Comp input ='f','g' /
output = 'obj' - analytical
derivatives declared
More specifically the Comp2 is similar to the paraboloid example and Comp3 is just a linear function. There are no constraints.
If I use the central FD on the Comp2, it evaluates the External Code with the same input value (x) 4 times before it changes it, with the forward FD this is two times. I understand vaguely what the optimizer is trying to do but
repeating the same calculation is just a computational burden.
My remedy is to save the output file with the corresponding input name and in the second iteration check if this file exists, if so skip the calculation and read in from file. This works fine.
My questions are :
--- I was wondering should there be an embedded way to avoid re-evaluating the same function?
--- Even though I claim I understand a bit what the optimizer does, it is not clear why it tries to find a gradient without stepping.
Important Note :
This does not happen if Comp2 has two inputs, one output.
It happens again if Comp2 has two inputs, two outputs
Below an example output of the first few iterations.
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
OBJECTIVE 589362.0
----------------------------------------------------------------------
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
OBJECTIVE 589362.0
----------------------------------------------------------------------
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
ExtInX = [10.], , ExtOutF = 79.0, ExtOutG = 373.0
ExtInX = [10.000001], , ExtOutF = 79.00001700000098, ExtOutG = 373.0001500000209
ExtInX = [9.999999], , ExtOutF = 78.99998300000101, ExtOutG = 372.9998500000211
ExtInX = [8.07102609], , ExtOutF = 49.928382692675996, ExtOutG = 154.61605669448198
OBJECTIVE 250797.01369955452
----------------------------------------------------------------------
**ExtInX = [8.07102609], , ExtOutF = 49.928382692675996, ExtOutG = 154.61605669448198
ExtInX = [8.07102609], , ExtOutF = 49.928382692675996, ExtOutG = 154.61605669448198
ExtInX = [8.07102609], , ExtOutF = 49.928382692675996, ExtOutG = 154.61605669448198
ExtInX = [8.07102609], , ExtOutF = 49.928382692675996, ExtOutG = 154.61605669448198**
ExtInX = [8.07102709], , ExtOutF = 49.92839583472901, ExtOutG = 154.61613684041134
ExtInX = [8.07102509], , ExtOutF = 49.92836955062501, ExtOutG = 154.61597654858318
ExtInX = [4.88062761], , ExtOutF = 18.178645674383997, ExtOutG = 21.29321703417343
OBJECTIVE 38811.35361617729
AFTER THE FIRST ANSWER
Thank you for your answer.
From that I understand that in 2.2 the caching has to be done by the user and
one can do that in different ways. This is fine.
And the improvement you mention in 2.3 seems to fix one thing.
The single extra call in the beginning. I am more concerned about the
several calls with the same input during iterations.
I am adding the four code patches below. One should be able to run the optimizer
as long as the others are in the same folder.
In this case the external code is pure python. But simultaneously I am
developing an optimizer that uses an 'actual' external code. It is again
wrapped in python though. So some system calls are sent to the executable
and python handles the outputs from the external code, python does some post
processing and outputs a text file which is read by the OpenMDAO external code
component. So not sure if that one counts as pure python.
Regarding the constraints : In the below code there is none. In the other code
that I had explained there are some lifetime constraints for car components
(e.g. xx, lower=20).
CODE :
Optimizer.py : Driver
and other 3 .py files
https://gist.github.com/anonymous/2c9d5d182cbb24a2334c97b57a954802
In OpenMDAO v2.2 there is no build in way to handle result caching. So you have to roll your own solution there. Instead of saving the full output file, you could do the caching in memory and just store the last known input values. If they haven't changed, you can just leave the output values where they are.
In OpenMDAO V2.2 the drivers are coded to make one extra function call before even starting the optimization. This is done because of the way OpenMDAO handles linear constraints. It will pre-compute the derivatives associated with the linear constraints and store them.
In OpenMDAO V2.3 we've modified the drivers so they only make this extra call if you have linear constraints in your problem. So that, at least, explains one of your extra functions calls.
After examining your test problem there does appear to be a bug in the way OpenMDAO is finite-differencing the components. The problem shows up even if you use forward mode, and it appears related to the way you're declaring the partials.
There isn't anything wrong with the way your declared them, but with an alternate method of declaration I was able to get it to behave properly:
from openmdao.api import ExternalCode
import numpy as np
class ParaboloidExternalCode(ExternalCode):
def setup(self):
self.add_input('x', val=0.0)
self.add_input('y', val=0.0)
self.add_output('f_xy', val=0.0)
self.add_output('g_xy', val=0.0)
self.input_file = 'paraboloid_input.dat'
self.output_file = 'paraboloid_output.dat'
# providing these is optional; the component will verify that any input
# files exist before execution and that the output files exist after.
self.options['external_input_files'] = [self.input_file,]
self.options['external_output_files'] = [self.output_file,]
self.options['command'] = [
'python', 'extcode_paraboloid.py', self.input_file, self.output_file
]
# self.declare_partials(of='*', wrt='*', method='fd', form='central', step=1e-6)
self.declare_partials(of='f_xy', wrt=['x','y'], method='fd', form='central', step=1e-6)
self.declare_partials(of='g_xy', wrt=['x','y'], method='fd', form='central', step=1e-6)
def compute(self, inputs, outputs):
x = inputs['x']
y = inputs['y']
# generate the input file for the paraboloid external code
with open(self.input_file, 'w') as input_file:
input_file.write('%f\n%f\n' % (x,y))
# the parent compute function actually runs the external code
super(ParaboloidExternalCode, self).compute(inputs, outputs)
file_contents = np.loadtxt(self.output_file)
outputs['f_xy'] = file_contents [0]
outputs['g_xy'] = file_contents [1]
print('ExtInX = {}, ExtInY = {},, ExtOutF_XY = {}, ExtOutG_XY = {} '.format(x,y,file_contents [0],file_contents[1]) )
I got rid of the of='*', wrt='*' and replaced it with two separate calls which explicitly name the variables. I was also able to replicate this problem without the external code at all all, using an in-memory paraboloid. I've entered the bug into the the development backlog and we'll get it sorted out. In the meantime, this workaround solves the problem.
Please note that there was a bug in OpenMDAO V2.2 that did cause extra evaluations. It was fixed via OpenMDAO PR # 553
Related
I am trying to fine-tune a Huggingface bert-large-uncased-whole-word-masking model and i get a type error like this when training:
"TypeError: only integer tensors of a single element can be converted to an index"
Here is the code:
train_inputs = tokenizer(text_list[0:457], return_tensors='pt', max_length=512, truncation=True, padding='max_length')
train_inputs['labels']= train_inputs.input_ids.detach().clone()
Then i mask randomly about 15% of the words in the input-ids,
and define a class for the dataset, and then the mistake happens in the training loop:
class MeditationsDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings= encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return self.encodings.input_ids
train_dataset = MeditationsDataset(train_inputs)
train_dataloader = torch.utils.data.DataLoader(dataset= train_dataset, batch_size=8, shuffle=False)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
from transformers import BertModel, AdamW
model = BertModel.from_pretrained("bert-large-uncased-whole-word-masking")
model.to(device)
model.train()
optim = AdamW(model.parameters(), lr=1e-5)
num_epochs = 2
from tqdm.auto import tqdm
for epoch in range(num_epochs):
loop = tqdm(train_dataloader, leave=True)
for batch in loop:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
The mistake happens in "for batch in loop"
Does anybody understand it and know how to solve this? Thanks in advance for your help
In the class MeditationsDataset in function __getitem__ torch.tensor(val[idx]) is deprecated by PyTorch you should use instead val[idx].clone().detach()
I am trying to make a toy problem to learn a bit about the OpenMDAO software before applying the lessons to a larger problem. I have a problem set up so that the objective function should be minimized when both design variables are at a minimum. However both values stay at their originally assigned values despite receiving an 'Optimization terminated successfully' message.
I have been starting by writing the code based on the Sellar problem examples. ( http://openmdao.org/twodocs/versions/latest/basic_guide/sellar.html ) Additionally I have come across a stack overflow question that seems to be the same problem, but the solution there doesn't work. ( OpenMDAO: Solver converging to non-optimal point ) (When I add the declare_partials line to the IntermediateCycle or ScriptForTest I recieve an error saying either that self is not defined, or that the object has no attribute declare_partials)
This is the script that runs everything
import openmdao.api as om
from IntermediateForTest import IntermediateCycle
prob = om.Problem()
prob.model = IntermediateCycle()
prob.driver = om.ScipyOptimizeDriver()
#prob.driver.options['optimizer'] = 'SLSQP'
#prob.driver.options['tol'] = 1e-9
prob.model.add_design_var('n_gear', lower=2, upper=6)
prob.model.add_design_var('stroke', lower=0.0254, upper=1)
prob.model.add_objective('objective')
prob.setup()
prob.model.approx_totals()
prob.run_driver()
print(prob['objective'])
print(prob['cycle.f1.total_weight'])
print(prob['cycle.f1.stroke'])
print(prob['cycle.f1.n_gear'])
It calls an intermediate group, as per the Sellar example
import openmdao.api as om
from FunctionsForTest import FunctionForTest1
from FunctionsForTest import FunctionForTest2
class IntermediateCycle(om.Group):
def setup(self):
indeps = self.add_subsystem('indeps', om.IndepVarComp(), promotes=['*'])
indeps.add_output('n_gear', 3.0)
indeps.add_output('stroke', 0.2)
indeps.add_output('total_weight', 26000.0)
cycle = self.add_subsystem('cycle', om.Group())
cycle.add_subsystem('f1', FunctionForTest1())
cycle.add_subsystem('f2', FunctionForTest2())
cycle.connect('f1.landing_gear_weight','f2.landing_gear_weight')
cycle.connect('f2.total_weight','f1.total_weight')
self.connect('n_gear','cycle.f1.n_gear')
self.connect('stroke','cycle.f1.stroke')
#cycle.nonlinear_solver = om.NonlinearBlockGS()
self.nonlinear_solver = om.NonlinearBlockGS()
self.add_subsystem('objective', om.ExecComp('objective = total_weight', objective=26000, total_weight=26000), promotes=['objective', 'total_weight'])
Finally there is a file with the two functions in it:
import openmdao.api as om
class FunctionForTest1(om.ExplicitComponent):
def setup(self):
self.add_input('stroke', val=0.2)
self.add_input('n_gear', val=3.0)
self.add_input('total_weight', val=26000)
self.add_output('landing_gear_weight')
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
stroke = inputs['stroke']
n_gear = inputs['n_gear']
total_weight = inputs['total_weight']
outputs['landing_gear_weight'] = total_weight * 0.1 + 100*stroke * n_gear ** 2
class FunctionForTest2(om.ExplicitComponent):
def setup(self):
self.add_input('landing_gear_weight')
self.add_output('total_weight')
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
landing_gear_weight = inputs['landing_gear_weight']
outputs['total_weight'] = 26000 + landing_gear_weight
It reports optimization terminated successfully,
Optimization terminated successfully. (Exit mode 0)
Current function value: 26000.0
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
Optimization Complete
-----------------------------------
[26000.]
[29088.88888889]
[0.2]
[3.]
however the value for the function to optimize hasn't changed. It seems as it converges the loop to estimate the weight, but doesn't vary the design variables to find the optimum.
It arrives at 29088.9, which is correct for a value of n_gear=3 and stroke=0.2, but if both are decreased to the bounds of n_gear=2 and stroke=0.0254, it would arrive at a value of ~28900, ~188 less.
Any advice, links to tutorials, or solutions would be appreciated.
Lets take a look at the n2 of the model, as you provided it:
I've highlighted the connection from indeps.total_weight to objective.total_weight. So this means that your computed total_weight value is not being passed to your objective output at all. Instead you have a constant value being set there.
Now, taking a small step back, lets look at the computation of the objective itself:
self.add_subsystem('objective', om.ExecComp('objective = total_weight', objective=26000, total_weight=26000), promotes=['objective', 'total_weight'])
So this is an odd use of the ExecComp, because it just sets the output to exactly the input. It does nothing, and isn't really needed at all.
I believe what you wanted was simply to make the objective be the output f2.total_weight. When I do that (and make a few additional small cleanups to your code, like removing the unnecessary ExecComp, then I do get the correct answer in 2 major iterations of the optimizer:
import openmdao.api as om
class FunctionForTest1(om.ExplicitComponent):
def setup(self):
self.add_input('stroke', val=0.2)
self.add_input('n_gear', val=3.0)
self.add_input('total_weight', val=26000)
self.add_output('landing_gear_weight')
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
stroke = inputs['stroke']
n_gear = inputs['n_gear']
total_weight = inputs['total_weight']
outputs['landing_gear_weight'] = total_weight * 0.1 + 100*stroke * n_gear ** 2
class FunctionForTest2(om.ExplicitComponent):
def setup(self):
self.add_input('landing_gear_weight')
self.add_output('total_weight')
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
landing_gear_weight = inputs['landing_gear_weight']
outputs['total_weight'] = 26000 + landing_gear_weight
class IntermediateCycle(om.Group):
def setup(self):
indeps = self.add_subsystem('indeps', om.IndepVarComp(), promotes=['*'])
indeps.add_output('n_gear', 3.0)
indeps.add_output('stroke', 0.2)
cycle = self.add_subsystem('cycle', om.Group())
cycle.add_subsystem('f1', FunctionForTest1())
cycle.add_subsystem('f2', FunctionForTest2())
cycle.connect('f1.landing_gear_weight','f2.landing_gear_weight')
cycle.connect('f2.total_weight','f1.total_weight')
self.connect('n_gear','cycle.f1.n_gear')
self.connect('stroke','cycle.f1.stroke')
#cycle.nonlinear_solver = om.NonlinearBlockGS()
self.nonlinear_solver = om.NonlinearBlockGS()
prob = om.Problem()
prob.model = IntermediateCycle()
prob.driver = om.ScipyOptimizeDriver()
#prob.driver.options['optimizer'] = 'SLSQP'
#prob.driver.options['tol'] = 1e-9
prob.model.add_design_var('n_gear', lower=2, upper=6)
prob.model.add_design_var('stroke', lower=0.0254, upper=1)
prob.model.add_objective('cycle.f2.total_weight')
prob.model.approx_totals()
prob.setup()
prob.model.nl_solver.options['iprint'] = 2
prob.run_driver()
print(prob['cycle.f1.total_weight'])
print(prob['cycle.f2.total_weight'])
print(prob['cycle.f1.stroke'])
print(prob['cycle.f1.n_gear'])
gives:
Optimization terminated successfully. (Exit mode 0)
Current function value: 28900.177777779667
Iterations: 2
Function evaluations: 2
Gradient evaluations: 2
Optimization Complete
-----------------------------------
[28900.1777778]
[28900.17777778]
[0.0254]
[2.]
I am new to OpenMDAO and am trying to solve an optimization problem. When I run the code, I receive the following error "'numpy.ndarray' object has no attribute 'log'" Cannot resolve the problem? Any suggestions?
I have reviewed the OpenMDAO documentation.
Error message: 'numpy.ndarray' object has no attribute 'log'
from __future__ import division, print_function
import openmdao.api as om
import numpy as np
class Objective (om.ExplicitComponent):
def setup(self):
self.add_input('mu1', val = 3.84)
self.add_input('mu2', val = 3.84)
self.add_output('f', val = 0.00022)
# Finite difference all partials.
self.declare_partials('*', '*', method='fd')
def compute(self, inputs, outputs):
mu1 = inputs['mu1']
mu2 = inputs['mu2']
outputs['f'] = np.log((mu1*(0.86))/(1.0-(mu1*0.14)))+np.log((mu2*(0.86))/(1.0-(mu2*0.14)))
# build the model
prob = om.Problem()
indeps = prob.model.add_subsystem('indeps', om.IndepVarComp())
indeps.add_output('mu1', 3.84)
indeps.add_output('mu2', 3.84)
prob.model.add_subsystem('obj', Objective())
prob.model.add_subsystem('cnst', om.ExecComp('g = 7924.8 - 2943.0*(np.log(mu1)) - 2943.0*(np.log(mu2))'))
prob.model.connect('indeps.mu1', 'obj.mu1')
prob.model.connect('indeps.mu2', 'obj.mu2')
prob.model.connect('indeps.mu1', 'cnst.mu1')
prob.model.connect('indeps.mu2', 'cnst.mu2')
# setup the optimization
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'COBYLA'
prob.model.add_design_var('indeps.mu1', lower=0.0, upper=5.0)
prob.model.add_design_var('indeps.mu2', lower=0.0, upper=5.0)
prob.model.add_objective('obj.f')
prob.model.add_constraint('cnst.g', lower=0.0, upper=0.0)
prob.setup()
prob.run_driver()
The problem is in the definition of your ExecComp. You have np.log, but the way the string parsing works for that you just wanted log.
Try this instead:
'g = 7924.8 - 2943.0*(log(mu1)) - 2943.0*(log(mu2))'
With that change I got:
Normal return from subroutine COBYLA
NFVALS = 46 F = 3.935882E+00 MAXCV = 2.892193E-10
X = 3.843492E+00 3.843491E+00
Optimization Complete
With reconfigurable model execution it is possible to resize inputs and outputs of components. How are the connections updated, when reconfigured outputs and inputs are connected?
In the example below the output c2.y and c3.y is resized at each model run. This input and output is supposed to be connected, as shown in the N2 chart. However, after the reconfiguration the connection size seems to be not updated automatically, it throws the following error:
ValueError: The source and target shapes do not match or are ambiguous for the connection 'c2.y' to 'c3.y'. Expected (1,) but got (2,).
I included below 3 tests, with promoted connection, absolute connection, and the last one with reconfiguration but without the connection (which works).
The last chance would be to declare the connection in the parent group of the comps, which I did not try yet.
The tests:
Promoted connection
Absolute connection
No connection
Reconfigurable component classes and tests:
from __future__ import division
import logging
import numpy as np
import unittest
from openmdao.api import Problem, Group, IndepVarComp, ExplicitComponent
from openmdao.utils.assert_utils import assert_rel_error
class ReconfComp(ExplicitComponent):
def initialize(self):
self.size = 1
self.counter = 0
def reconfigure(self):
logging.info('reconf started {}'.format(self.pathname))
self.counter += 1
logging.info('reconf ended {}'.format(self.pathname))
if self.counter % 2 == 0:
self.size += 1
return True
else:
return False
def setup(self):
logging.info('setup started {}'.format(self.pathname))
self.add_input('x', val=1.0)
self.add_output('y', val=np.zeros(self.size))
# All derivatives are defined.
self.declare_partials(of='*', wrt='*')
logging.info('setup ended {}'.format(self.pathname))
def compute(self, inputs, outputs):
logging.info('compute started {}'.format(self.pathname))
outputs['y'] = 2 * inputs['x']
logging.info('compute ended {}'.format(self.pathname))
def compute_partials(self, inputs, jacobian):
jacobian['y', 'x'] = 2 * np.ones((self.size, 1))
class ReconfComp2(ReconfComp):
"""The size of the y input changes the same as way as in ReconfComp"""
def setup(self):
logging.info('setup started {}'.format(self.pathname))
self.add_input('y', val=np.zeros(self.size))
self.add_output('f', val=np.zeros(self.size))
# All derivatives are defined.
self.declare_partials(of='*', wrt='*')
logging.info('setup ended {}'.format(self.pathname))
def compute(self, inputs, outputs):
logging.info('compute started {}'.format(self.pathname))
outputs['f'] = 2 * inputs['y']
logging.info('compute ended {}'.format(self.pathname))
def compute_partials(self, inputs, jacobian):
jacobian['f', 'y'] = 2 * np.ones((self.size, 1))
class TestReconfConnections(unittest.TestCase):
def test_reconf_comp_promoted_connections(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'], promotes_outputs=['y'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_inputs=['y'],
promotes_outputs=['f'])
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
totals = p.compute_totals(wrt=['x'], of=['y'])
assert_rel_error(self, p['x'], 3.0)
assert_rel_error(self, p['y'], 6.0)
assert_rel_error(self, totals['y', 'x'], [[2.0]])
print(p['x'], p['y'], totals['y', 'x'].flatten())
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
p.run_model() # FIXME Fails with ValueError
def test_reconf_comp_connections(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_outputs=['f'])
p.model.connect('c2.y', 'c3.y')
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
p.run_model() # FIXME Fails with ValueError
def test_reconf_comp_not_connected(self):
p = Problem()
p.model = Group()
p.model.add_subsystem('c1', IndepVarComp('x', 1.0), promotes_outputs=['x'])
p.model.add_subsystem('c2', ReconfComp(), promotes_inputs=['x'])
p.model.add_subsystem('c3', ReconfComp2(), promotes_outputs=['f'])
# c2.y not connected to c3.y
p.setup()
p['x'] = 3.
# First run the model once; counter = 1, size of y = 1
p.run_model()
# Run the model again, which will trigger reconfiguration; counter = 2, size of y = 2
fail, _, _ = p.run_model()
self.assertFalse(fail)
if __name__ == '__main__':
unittest.main()
UPDATE:
It seems, that in Group._var_abs2meta only the source size is updated, but not the target. The setup of the connections starts, before the setup of the parent group or the setup of the other component would be called.
UPDATE 2:
This happens with the default NonlinearRunOnce solver, with a NewtonSolver of NonlinearBlockGS there is no error, but the variable sizes also don't change.
As of OpenMDAO V2.5 reconfigurable model variables is not an officially supported feature in the framework. The bare bones of the capability has been in the code since that research was done, but it wasn't something that was high priority enough for us to finalize. A recent major refactor in V2.4 re-worked how some underlying data-structures worked and must have broken this functionality.
It is on our development priority list to get this working again, but its not super high on that list. We focus development mainly on features that have a direct in-house applications, and we don't have one of those yet.
If you could provide a decently complete set of tests for it, we could take a look at getting the functionality working.
I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
#staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
#staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(x.mm(w1)).mm(w2)
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)
Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.