Error when implementing RBF kernel bandwidth differentiation in Pytorch - python-3.6

I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
#staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
#staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(x.mm(w1)).mm(w2)
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)

Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.

Related

Optimizing Distributed I/O with serial output

I am having trouble understanding how to optimize a distributed component with a serial output. This is my attempt with an example problem given in the openmdao docs.
import numpy as np
import openmdao.api as om
from openmdao.utils.array_utils import evenly_distrib_idxs
from openmdao.utils.mpi import MPI
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
# Distributed Input
self.add_input('in_dist', shape_by_conn=True, distributed=True)
# Serial Input
self.add_input('in_serial', val=1)
# Distributed Output
self.add_output('out_dist', copy_shape='in_dist', distributed=True)
# Serial Output
self.add_output('out_serial', copy_shape='in_serial')
#self.declare_partials('*','*', method='cs')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
outputs['out_dist'] = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
outputs['out_serial'] = g_y * total_sum
else:
# Recommended to make sure your code can run in serial too, for testing.
outputs['out_serial'] = g_y * np.sum(g_x)
size = 7
if MPI:
comm = MPI.COMM_WORLD
rank = comm.rank
sizes, offsets = evenly_distrib_idxs(comm.size, size)
else:
# When running in serial, the entire variable is on rank 0.
rank = 0
sizes = {rank : size}
offsets = {rank : 0}
prob = om.Problem()
model = prob.model
# Create a distributed source for the distributed input.
ivc = om.IndepVarComp()
ivc.add_output('x_dist', np.zeros(sizes[rank]), distributed=True)
ivc.add_output('x_serial', val=1)
model.add_subsystem("indep", ivc)
model.add_subsystem("D1", MixedDistrib2())
model.add_subsystem('con_cmp1', om.ExecComp('con1 = y**2'), promotes=['con1', 'y'])
model.connect('indep.x_dist', 'D1.in_dist')
model.connect('indep.x_serial', ['D1.in_serial','y'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('indep.x_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
model.add_objective('D1.out_serial')
prob.setup(force_alloc_complex=True)
#prob.setup()
# Set initial values of distributed variable.
x_dist_init = [1,1,1,1,1,1,1]
prob.set_val('indep.x_dist', x_dist_init)
# Set initial values of serial variable.
prob.set_val('indep.x_serial', 10)
#prob.run_model()
prob.run_driver()
print('x_dist', prob.get_val('indep.x_dist', get_remote=True))
print('x_serial', prob.get_val('indep.x_serial'))
print('Obj', prob.get_val('D1.out_serial'))
The problem is with defining partials with 'fd' or 'cs'. I cannot define partials of serial output w.r.t distributed input. So I used prob.setup(force_alloc_complex=True) to use complex step. But gives me this warning DerivativesWarning:Constraints or objectives [('D1.out_serial', inds=[0])] cannot be impacted by the design variables of the problem. I understand this is because the total derivative is 0 which causes the warning but I dont understand the reason. Clearly the total derivative should not be 0 here. But I guess this is because I didn't explicitly declare_partials in the component. I tried removing the distributed components and ran it again with declare_partials and this works correctly(code below).
import numpy as np
import openmdao.api as om
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
self.add_input('in_dist', np.zeros(7))
self.add_input('in_serial', val=1)
self.add_output('out_serial', val=0)
self.declare_partials('*','*', method='cs')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
g_y = y**2 + 3.0*y - 5.0
g_x = x ** 0.5
outputs['out_serial'] = g_y * np.sum(g_x)
prob = om.Problem()
model = prob.model
model.add_subsystem("D1", MixedDistrib2(), promotes_inputs=['in_dist', 'in_serial'], promotes_outputs=['out_serial'])
model.add_subsystem('con_cmp1', om.ExecComp('con1 = in_serial**2'), promotes=['con1', 'in_serial'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('in_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
model.add_objective('out_serial')
prob.setup(force_alloc_complex=True)
prob.set_val('in_dist', [1,1,1,1,1,1,1])
prob.set_val('in_serial', 10)
prob.run_model()
prob.check_totals()
prob.run_driver()
print('x_dist', prob.get_val('in_dist', get_remote=True))
print('x_serial', prob.get_val('in_serial'))
print('Obj', prob.get_val('out_serial'))
What I am trying to understand is
How to use 'fd' or 'cs' in Distributed component with a serial output?
What is the meaning of prob.setup(force_alloc_complex=True) ? Is not forcing to use cs in all the components in the problem ? If so why does the total derivative becomes 0?
When I run your code in OpenMDAO V 3.11.0 (after uncommenting the declare_partials call) I get the following error:
RuntimeError: 'D1' <class MixedDistrib2>: component has defined partial ('out_serial', 'in_dist') which is a serial output wrt a distributed input. This is only supported using the matrix free API.
As the error indicates, you can't use the matrix-based api for derivatives in this situations. The reasons why are a bit subtle, and probably outside the scope of what needs to be delt with to answer your question here. It boils down to OpenMDAO not knowing why kind of distributed operations are being done in the compute and having no way to manage those details when you propagate things in reverse.
So you need to use the matrix-free derivative APIs in this situation. When you use the matrix-free APIs you DO NOT declare any partials, because you don't want OpenMDAO to allocate any memory for you to store partials in (and you wouldn't use that memory even if it did).
I've coded them for your example here, but I need to note a few important details:
Your example has a distributed IVC, but as of OpenMDAO V3.11.0 you can't get total derivatives with respect to distributed design variables. I assume you just made it that way to make your simple test case, but in case your real problem was set up this way, you need to note this and not do it this way. Instead, make the IVC serial, and use src indices to distribute the correct parts to each proc.
In the example below, the derivatives are correct. However, there seems to be a bug in the check_partials output when running in paralle. So the reverse mode partials look like they are off by a factor of the comm size... this will have to get fixed in later releases.
I only did the derivatives for out_serial. out_dist will work similarly and is left as an excersize for the reader :)
You'll notice that I duplicates some code in the compute and compute_jacvec_product methods. You can abstract this duplicate code out into its own method (or call compute from within compute_jacvec_product by providing your own output dictionary). However, you might be asking why the duplicate call is needed at all? Why can't u store the values from the compute call. The answer is, in large part, that OpenMDAO does not guarantee that compute is always called before compute_jacvec_product. However, I'll also point out that this kind of code duplication is very AD-like. Any AD code will have the same kind of duplication built in, even though you don't see it.
import numpy as np
import openmdao.api as om
from openmdao.utils.array_utils import evenly_distrib_idxs
from openmdao.utils.mpi import MPI
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
# Distributed Input
self.add_input('in_dist', shape_by_conn=True, distributed=True)
# Serial Input
self.add_input('in_serial', val=1)
# Distributed Output
self.add_output('out_dist', copy_shape='in_dist', distributed=True)
# Serial Output
self.add_output('out_serial', copy_shape='in_serial')
# self.declare_partials('*','*', method='fd')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
outputs['out_dist'] = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
outputs['out_serial'] = g_y * total_sum
else:
# Recommended to make sure your code can run in serial too, for testing.
outputs['out_serial'] = g_y * np.sum(g_x)
def compute_jacvec_product(self, inputs, d_inputs, d_outputs, mode):
x = inputs['in_dist']
y = inputs['in_serial']
g_y = y**2 + 3.0*y - 5.0
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
out_dist = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
# total_sum
else:
# Recommended to make sure your code can run in serial too, for testing.
total_sum = np.sum(g_x)
num_x = len(x)
d_f_x__d_x = np.diag(2*x - 2.)
d_f_y__d_y = np.ones(num_x)*0.5*y**-0.5
d_g_y__d_y = 2*y + 3.
d_g_x__d_x = 0.5*x**-0.5
d_out_dist__d_x = d_f_x__d_x # square matrix
d_out_dist__d_y = d_f_y__d_y # num_x,1
d_out_serial__d_y = d_g_y__d_y # scalar
d_out_serial__d_x = g_y*d_g_x__d_x.reshape((1,num_x))
if mode == 'fwd':
if 'out_serial' in d_outputs:
if 'in_dist' in d_inputs:
d_outputs['out_serial'] += d_out_serial__d_x.dot(d_inputs['in_dist'])
if 'in_serial' in d_inputs:
d_outputs['out_serial'] += d_out_serial__d_y.dot(d_inputs['in_serial'])
elif mode == 'rev':
if 'out_serial' in d_outputs:
if 'in_dist' in d_inputs:
d_inputs['in_dist'] += d_out_serial__d_x.T.dot(d_outputs['out_serial'])
if 'in_serial' in d_inputs:
d_inputs['in_serial'] += total_sum*d_out_serial__d_y.T.dot(d_outputs['out_serial'])
size = 7
if MPI:
comm = MPI.COMM_WORLD
rank = comm.rank
sizes, offsets = evenly_distrib_idxs(comm.size, size)
else:
# When running in serial, the entire variable is on rank 0.
rank = 0
sizes = {rank : size}
offsets = {rank : 0}
prob = om.Problem()
model = prob.model
# Create a distributed source for the distributed input.
ivc = om.IndepVarComp()
ivc.add_output('x_dist', np.zeros(sizes[rank]), distributed=True)
ivc.add_output('x_serial', val=1)
model.add_subsystem("indep", ivc)
model.add_subsystem("D1", MixedDistrib2())
model.add_subsystem('con_cmp1', om.ExecComp('con1 = y**2'), promotes=['con1', 'y'])
model.connect('indep.x_dist', 'D1.in_dist')
model.connect('indep.x_serial', ['D1.in_serial','y'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('indep.x_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
model.add_objective('D1.out_serial')
prob.setup(force_alloc_complex=True)
#prob.setup()
# Set initial values of distributed variable.
x_dist_init = np.ones(sizes[rank])
prob.set_val('indep.x_dist', x_dist_init)
# Set initial values of serial variable.
prob.set_val('indep.x_serial', 10)
prob.run_model()
prob.check_partials()
# prob.run_driver()
print('x_dist', prob.get_val('indep.x_dist', get_remote=True))
print('x_serial', prob.get_val('indep.x_serial'))
print('Obj', prob.get_val('D1.out_serial'))

Gradient flow stopped on a combined model

I meet with a problem that the gradient cannot backpropagate on a combined network. I checked lots of answers but cannot find a relevant solution to this problem. I would appreciate it so much if we can solve this.
I wanted to calculate the gradient for input data in this code:
for i, (input, target, impath) in tqdm(enumerate(data_loader)):
# print(‘input.shape:’, input.shape)
input = Variable(input.cuda(), requires_grad=True)
output = model(input)
loss = criterion(output, target.cuda())
loss = Variable(loss, requires_grad=True)
loss.backward()
print(‘input:’, input.grad.data)
but I got errror:
print(‘input:’, input.grad.data)
AttributeError: ‘NoneType’ object has no attribute ‘data’
and my model is a combined model that I loaded the parameters from two pretrained models.
I checked the requires_grad state-dict of model weights, it is true, however, the gradient of the model weights is None.
Is it because I load the state-dict that caused the gradient block?
How can I deal with this problem?
The model structure is attached below:
class resnet_model(nn.Module):
def __init__(self, opt):
super(resnet_model, self).__init__()
resnet = models.resnet101()
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 1000)
if opt.resnet_path != None:
state_dict = torch.load(opt.resnet_path)
resnet.load_state_dict(state_dict)
print("resnet load state dict from {}".format(opt.resnet_path))
self.model1 = torch.nn.Sequential()
for chd in resnet.named_children():
if chd[0] != 'fc':
self.model1.add_module(chd[0], chd[1])
self.model2 = torch.nn.Sequential()
self.classifier = LINEAR_LOGSOFTMAX(input_dim=2048, nclass=200)
if opt.pretrained != None:
self.classifier_state_dict = torch.load('../checkpoint/{}_cls.pth'.format(opt.pretrained))
print("classifier load state dict from ../checkpoint/{}_cls.pth".format(opt.pretrained))
self.classifier.load_state_dict(self.classifier_state_dict)
for chd in self.classifier.named_children():
self.model2.add_module(chd[0], chd[1])
def forward(self, x):
x = self.model1(x)
x = x.view(-1, 2048)
x = self.model2(x)
return x
The problem is solved with this comment:
Why do you have this line: loss = Variable(loss, requires_grad=True) ?
Variable should not be used anymore.
So the line above should be deleted and to mark a Tensor for which you want gradients, you can use:
input = input.cuda().requires_grad_().

Tensorflow: 6 layer CNN: OOM (use 10Gb GPU memory)

I am using the following code for running a 6 layer CNN with 2 FC layers on top (on Tesla K-80 GPU).
Somehow, it consumes entire memory 10GB and died out of memory.I know that i can reduce the batch_size and then run , but i also want to run with 15 or 20 CNN layers.Whats wrong with the following code and why it takes all the memory? How should i run the code for 15 layers CNN.
Code:
import model
with tf.Graph().as_default() as g_train:
filenames = tf.train.match_filenames_once(FLAGS.train_dir+'*.tfrecords')
filename_queue = tf.train.string_input_producer(filenames, shuffle=True, num_epochs=FLAGS.num_epochs)
feats,labels = get_batch_input(filename_queue, batch_size=FLAGS.batch_size)
### feats size=(batch_size, 100, 50)
logits = model.inference(feats, FLAGS.batch_size)
loss = model.loss(logits, labels, feats)
tvars = tf.trainable_variables()
global_step = tf.Variable(0, name='global_step', trainable=False)
# Add to the Graph operations that train the model.
train_op = model.training(loss, tvars, global_step, FLAGS.learning_rate, FLAGS.clip_gradients)
# Add the Op to compare the logits to the labels during evaluation.
eval_correct = model.evaluation(logits, labels, feats)
summary_op = tf.merge_all_summaries()
saver = tf.train.Saver(tf.all_variables(), max_to_keep=15)
# The op for initializing the variables.
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
summary_writer = tf.train.SummaryWriter(FLAGS.model_dir,
graph=sess.graph)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
step = 0
while not coord.should_stop():
_, loss_value = sess.run([train_op, loss])
if step % 100 == 0:
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value))
# Update the events file.
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
if (step == 0) or (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
ckpt_model = os.path.join(FLAGS.model_dir, 'model.ckpt')
saver.save(sess, ckpt_model, global_step=step)
#saver.save(sess, FLAGS.model_dir, global_step=step)
step += 1
except tf.errors.OutOfRangeError:
print('Done training for %d epochs, %d steps.' % (FLAGS.num_epochs, step))
finally:
coord.join(threads)
sess.close()
###################### File model.py ####################
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1],
padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2,s=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, s,
s,1],padding='SAME')
def inference(feats,batch_size):
#feats size (batch_size,100,50,1) #batch_size=256
conv1_w=tf.get_variable("conv1_w", [filter_size,filter_size,1,256],initializer=tf.uniform_unit_scaling_initializer())
conv1_b=tf.get_variable("conv1_b",[256])
conv1 = conv2d(feats, conv1_w, conv1_b,2)
conv1 = maxpool2d(conv1, k=2,s=2)
### This was replicated for 6 layers and the 2 FC connected layers are added
return logits
def training(loss, train_vars, global_step, learning_rate, clip_gradients):
# Add a scalar summary for the snapshot loss.
tf.scalar_summary(loss.op.name, loss)
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, train_vars,aggregation_method=1), clip_gradients)
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.apply_gradients(zip(grads, train_vars), global_step=global_step)
return train_op
I am not too sure what the model python library is. If it is something you wrote and can change the setting in the optimizer I would suggest the following which I use in my own code
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cost, aggregation_method = tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
By default the aggeragetion_method is ADD_N but if you change it to EXPERIMENTAL_ACCUMULATE_N or EXPERIMENTAL_TREE this will greatly save memory. The main memory hog in these programs is that tensorflow must save the output values at every neuron so that it can compute the gradients. Changing the aggregation_method helps a lot from my experience.
Also BTW I don't think there is anything wrong with your code. I can run out of memory on small cov-nets as well.

How to call numerical results to integrate a ODE using Runge-Kutta-4 in Python 3?

I'm trying to solve (for m_0) numerically the following ordinary differential equation:
dm0/dx=(((1-x)*(x*(2-x))**(1.5))/(k+x)**2)*(((x*(2-x))/3.0)*(dw/dx)**2 + ((8*(k+1))/(3*(k+x)))*w**2)
The values of w and dw/dx have been found already numerically using the Runge-Kutta 4th order and k is a factor that is fixed. I wrote a code where I call the values for w and dw/dx from an external file, then I organize them in an array, then I call the array in the function and then I run the integration. My outcome is not what it's expected :(, I don't know what is wrong. If anyone could give me a hand, it would be highly appreciated. Thank you!
from math import sqrt
from numpy import array,zeros,loadtxt
from printSoln import *
from run_kut4 import *
m = 1.15 # Just a constant.
k = 3.0*sqrt(1.0-(1.0/m))-1.0 # k in terms of m.
omegas = loadtxt("omega.txt",float) # Import values of w
domegas = loadtxt("domega.txt",float) # Import values of dw/dx
w = [] # Defines the array w to store the values w^2
s = 0.0
for s in omegas:
w.append(s**2) # Calculates the values w**2
omeg = array(w,float) # Array to store the value of w**2
dw = [] # Defines the array dw to store the values dw**2
t = 0.0
for t in domegas:
dw.append(t**2) # Calculates the values for dw**2
domeg = array(dw,float) # Array to store the values of dw**2
x = 1.0e-12 # Starting point of integration
xStop = (2.0 - k)/3.0 # Final point of integration
def F(x,y): # Define function to be integrated
F = zeros(1)
for i in domeg: # Loop to call w^2, (dw/dx)^2
for j in omeg:
F[0] = (((1.0-x)*(x*(2.0-x))**(1.5))/(k+x)**2)*((1.0/3.0)*x* (2.0-x)*domeg[i] + (8.0*(k+1.0)*omeg[j])/(3.0*(k+x)))
return F
y = array([((32.0*sqrt(2.0)*(k+1.0)*(x**2.5))/(15.0*(k**3)))]) # Initial condition for m_{0}
h = 1.0e-5 # Integration step
freq = 0 # Prints only initial and final values
X,Y = integrate(F,x,y,xStop,h) # Calls Runge-Kutta 4
printSoln(X,Y,freq) # Prints solution
Interpreting your verbal description, there is an ODE for omega, w'=F(x,w), and a coupled ODE for m0, m'=G(x,m,w,w'). The almost always optimal way to solve this is to treat it as system of ODE,
def ODEfunc(x,y)
w,m = y
dw = F(x,w)
dm = G(x,m,w,dw)
return np.array([dw, dm])
which you can then insert in the ODE solver of your choice, e.g., the fictitious
ODEintegrate(ODEfunc, xsamples, y0)

(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using "vanilladot"?

I wanted to use user-defined kernel function for Ksvm in R.
so, I tried to make a vanilladot kernel and compare with "vanilladot" which is built in "kernlab" as practice.
I write my kernel as follow.
#
###vanilla kernel with class "kernel"
#
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
###use kfunction.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.k
I thouhgt result of this operation is eqaul to that of following.
However It dosn't.
#
###use "vanilladot"
#
l<-0.1 ; C<-1/(2*l)
tmp1<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel="vanilladot", C = C)
alpha(tmp1)[[1]]
ind1<-alphaindex(tmp1)[[1]]
x.s<-x[ind1,] ; y.s<-y[ind1]
w.tmp1<-t(alpha(tmp1)[[1]]*y.s)%*%x.s
w.tmp1
I think maybe this problem is related to kernel class.
When class is set to "kernel", this problem is occured.
However When class is set to "vanillakernel", the result of ksvm using user-defined kernel is equal to that of ksvm using "vanilladot" which is built in Kernlab.
#
###vanilla kernel with class "vanillakernel"
#
kfunction.v.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "vanillakernel"
k}
# The only difference between kfunction.k and kfunction.v.k is "class(k)".
l<-0.1 ; C<-1/(2*l)
###use kfunction.v.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.v.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.v.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.v.k
I don't understand why the result is different from "vanilladot", when setting the class to "kernel".
Is there an error in my operation?
First, it seems like a really good question!
Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:
if (type(ret) == "spoc-svc") {
if (!is.null(class.weights))
weightedC <- class.weights[weightlabels] * rep(C,
nclass(ret))
else weightedC <- rep(C, nclass(ret))
yd <- sort(y, method = "quick", index.return = TRUE)
xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
count <- 0
if (ktype == 4)
K <- kernelMatrix(kernel, x)
resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)),
as.integer(ncol(xd)), as.double(rep(yd$x - 1,
2)), as.double(K), as.integer(if (sparse) xd#ia else 0),
as.integer(if (sparse) xd#ja else 0), as.integer(sparse),
as.integer(nclass(ret)), as.integer(count), as.integer(ktype),
as.integer(7), as.double(C), as.double(epsilon),
as.double(sigma), as.integer(degree), as.double(offset),
as.double(C), as.double(2), as.integer(0), as.double(0),
as.integer(0), as.double(weightedC), as.double(cache),
as.double(tol), as.integer(10), as.integer(shrinking),
PACKAGE = "kernlab")
reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) +
1)], nclass(ret)))[reind, , drop = FALSE]
coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[,
x][alpha(ret)[, x] != 0])
names(coef(ret)) <- lev(ret)
alphaindex(ret) <- lapply(sort(unique(y)), function(x)
which(alpha(ret)[,
x] != 0))
xmatrix(ret) <- x
obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
names(alphaindex(ret)) <- lev(ret)
svindex <- which(rowSums(alpha(ret) != 0) != 0)
b(ret) <- 0
param(ret)$C <- C
}
The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:
in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
tron_optim routine is ran with the information regarding the kernel
Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine
if (param->kernel_type == LINEAR)
{
/* lots of code here */
while (Cpj < Cp)
{
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w,
Cpj, Cnj, param->eps, sii, param->shrinking,
param->qpsize);
/* lots of code here */
}
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
delete[] w;
}
else
{
Solver_B s;
s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
}
As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:
There is no error in your operation
kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.

Resources