I'm trying to solve (for m_0) numerically the following ordinary differential equation:
dm0/dx=(((1-x)*(x*(2-x))**(1.5))/(k+x)**2)*(((x*(2-x))/3.0)*(dw/dx)**2 + ((8*(k+1))/(3*(k+x)))*w**2)
The values of w and dw/dx have been found already numerically using the Runge-Kutta 4th order and k is a factor that is fixed. I wrote a code where I call the values for w and dw/dx from an external file, then I organize them in an array, then I call the array in the function and then I run the integration. My outcome is not what it's expected :(, I don't know what is wrong. If anyone could give me a hand, it would be highly appreciated. Thank you!
from math import sqrt
from numpy import array,zeros,loadtxt
from printSoln import *
from run_kut4 import *
m = 1.15 # Just a constant.
k = 3.0*sqrt(1.0-(1.0/m))-1.0 # k in terms of m.
omegas = loadtxt("omega.txt",float) # Import values of w
domegas = loadtxt("domega.txt",float) # Import values of dw/dx
w = [] # Defines the array w to store the values w^2
s = 0.0
for s in omegas:
w.append(s**2) # Calculates the values w**2
omeg = array(w,float) # Array to store the value of w**2
dw = [] # Defines the array dw to store the values dw**2
t = 0.0
for t in domegas:
dw.append(t**2) # Calculates the values for dw**2
domeg = array(dw,float) # Array to store the values of dw**2
x = 1.0e-12 # Starting point of integration
xStop = (2.0 - k)/3.0 # Final point of integration
def F(x,y): # Define function to be integrated
F = zeros(1)
for i in domeg: # Loop to call w^2, (dw/dx)^2
for j in omeg:
F[0] = (((1.0-x)*(x*(2.0-x))**(1.5))/(k+x)**2)*((1.0/3.0)*x* (2.0-x)*domeg[i] + (8.0*(k+1.0)*omeg[j])/(3.0*(k+x)))
return F
y = array([((32.0*sqrt(2.0)*(k+1.0)*(x**2.5))/(15.0*(k**3)))]) # Initial condition for m_{0}
h = 1.0e-5 # Integration step
freq = 0 # Prints only initial and final values
X,Y = integrate(F,x,y,xStop,h) # Calls Runge-Kutta 4
printSoln(X,Y,freq) # Prints solution

Interpreting your verbal description, there is an ODE for omega, w'=F(x,w), and a coupled ODE for m0, m'=G(x,m,w,w'). The almost always optimal way to solve this is to treat it as system of ODE,
def ODEfunc(x,y)
w,m = y
dw = F(x,w)
dm = G(x,m,w,dw)
return np.array([dw, dm])
which you can then insert in the ODE solver of your choice, e.g., the fictitious
ODEintegrate(ODEfunc, xsamples, y0)


MethodError with julia: cannot `convert` an object of type Matrix{ComplexF64}

I was working with Scilab and I decide to work with Julia however there are some errors which I didn't arrive to solve. For instance, I would like to fill out a vector using values of a given function but I got this error. Here is the code that I used:
using LinearAlgebra
A = [5/12 -1/12; 3/4 1/4]; c=[1/3;1]; b=[3/4; 1/4];
N = 10; T = 4; ts = (0:N)*T/N;
dt = T/N; λ = 10^(-14/(2*N+1));
m=length(c) ;
em0=b'/A # b^t * inv(A)
em1 = 1 .-em0*ones(m,1)
γ(z) =#. z/(1.0 -z*em1)
The over-arching issue you are facing is that, coming from Scilab, you are probably not used to distinguishing scalars, vectors and matrices. Like in Matlab, Scilab scalars are really 1x1 matrices, and vectors are really Nx1 or 1xN matrices.
This is very different in Julia. A scalar is not the same as a 1x1 matrix, and a vector is not the same as a Nx1 matrix. You should therefore take care to distinguish them. In particular, you should avoid creating a matrix, zeros(M, 1), when what you really need is a vector, zeros(M).
The direct reason for the error message is that γ(im) is a matrix, because em1 is a matrix:
julia> γ(im)
1×1 Matrix{ComplexF64}:
0.0 + 1.0im
u_hat is also a matrix of ComplexF64, and you are trying to assign a matrix as one of its elements, which naturally won't work, only scalar values can be elements of a Matrix{ComplexF64}.
I took the liberty of writing a cleaned up version of your code:
A = [5/12 -1/12; 3/4 1/4]
# use commas when defining vectors (this is just about style)
b = [3/4, 1/4]
N = 10
## None of the below variables are used. Try to make your example minimal
c = [1/3, 1]
T = 4
dt = T/N;
ts = (0:N) .* dt
λ = 10^(-14/(2*N+1))
m = length(c)
############### <- not used
# prefer vectors over 1xN or Nx1 matrices
em0 = A' \ b
# dot product of a vector and a vector of ones is just a sum, but super-wasterful and slow.
em1 = 1 - sum(em0)
# don't use global variables(!!!), and remove the `#.`
γ(z, a) = z / (1 - z * a)
# use vectors, not 1xN matrices, and directly create a complex matrix instead of converting a real one.
û = zeros(ComplexF64, N+1)
# Now this works
û[1] = γ(im, em1)
I renamed u_hat to û for fun.
Also: remember to put your code in a function, always.
Just in the case of locating the root of the problem:
The problem is where you declared the em1 as em1 = 1 .-em0*ones(m,1). Since the output of the em0*ones(m,1) is expected to be a scalar, you can grasp it using the only function (I don't argue with your approach, and that's out of the interest of this answer):
julia> using LinearAlgebra
# Note that with this modification, there isn't any need for `#.` anymore.
julia> γ(z) = z/(1.0 -z*em1)
γ (generic function with 1 method)
julia> A = [5/12 -1/12; 3/4 1/4]; c=[1/3;1]; b=[3/4; 1/4];
N = 10; T = 4; ts = (0:N)*T/N;
dt = T/N; λ = 10^(-14/(2*N+1));
#This is where the problem can be solved
em1 = 1 - only(em0*ones(m,1));
0.0 + 1.0im
julia> u_hat
1×11 Matrix{ComplexF64}:
0.0+1.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im … 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im 0.0+0.0im

Optimizing Distributed I/O with serial output

I am having trouble understanding how to optimize a distributed component with a serial output. This is my attempt with an example problem given in the openmdao docs.
import numpy as np
import openmdao.api as om
from openmdao.utils.array_utils import evenly_distrib_idxs
from openmdao.utils.mpi import MPI
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
# Distributed Input
self.add_input('in_dist', shape_by_conn=True, distributed=True)
# Serial Input
self.add_input('in_serial', val=1)
# Distributed Output
self.add_output('out_dist', copy_shape='in_dist', distributed=True)
# Serial Output
self.add_output('out_serial', copy_shape='in_serial')
#self.declare_partials('*','*', method='cs')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
outputs['out_dist'] = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
outputs['out_serial'] = g_y * total_sum
# Recommended to make sure your code can run in serial too, for testing.
outputs['out_serial'] = g_y * np.sum(g_x)
size = 7
if MPI:
rank = comm.rank
sizes, offsets = evenly_distrib_idxs(comm.size, size)
# When running in serial, the entire variable is on rank 0.
rank = 0
sizes = {rank : size}
offsets = {rank : 0}
prob = om.Problem()
model = prob.model
# Create a distributed source for the distributed input.
ivc = om.IndepVarComp()
ivc.add_output('x_dist', np.zeros(sizes[rank]), distributed=True)
ivc.add_output('x_serial', val=1)
model.add_subsystem("indep", ivc)
model.add_subsystem("D1", MixedDistrib2())
model.add_subsystem('con_cmp1', om.ExecComp('con1 = y**2'), promotes=['con1', 'y'])
model.connect('indep.x_dist', 'D1.in_dist')
model.connect('indep.x_serial', ['D1.in_serial','y'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('indep.x_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
# Set initial values of distributed variable.
x_dist_init = [1,1,1,1,1,1,1]
prob.set_val('indep.x_dist', x_dist_init)
# Set initial values of serial variable.
prob.set_val('indep.x_serial', 10)
print('x_dist', prob.get_val('indep.x_dist', get_remote=True))
print('x_serial', prob.get_val('indep.x_serial'))
print('Obj', prob.get_val('D1.out_serial'))
The problem is with defining partials with 'fd' or 'cs'. I cannot define partials of serial output w.r.t distributed input. So I used prob.setup(force_alloc_complex=True) to use complex step. But gives me this warning DerivativesWarning:Constraints or objectives [('D1.out_serial', inds=[0])] cannot be impacted by the design variables of the problem. I understand this is because the total derivative is 0 which causes the warning but I dont understand the reason. Clearly the total derivative should not be 0 here. But I guess this is because I didn't explicitly declare_partials in the component. I tried removing the distributed components and ran it again with declare_partials and this works correctly(code below).
import numpy as np
import openmdao.api as om
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
self.add_input('in_dist', np.zeros(7))
self.add_input('in_serial', val=1)
self.add_output('out_serial', val=0)
self.declare_partials('*','*', method='cs')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
g_y = y**2 + 3.0*y - 5.0
g_x = x ** 0.5
outputs['out_serial'] = g_y * np.sum(g_x)
prob = om.Problem()
model = prob.model
model.add_subsystem("D1", MixedDistrib2(), promotes_inputs=['in_dist', 'in_serial'], promotes_outputs=['out_serial'])
model.add_subsystem('con_cmp1', om.ExecComp('con1 = in_serial**2'), promotes=['con1', 'in_serial'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('in_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
prob.set_val('in_dist', [1,1,1,1,1,1,1])
prob.set_val('in_serial', 10)
print('x_dist', prob.get_val('in_dist', get_remote=True))
print('x_serial', prob.get_val('in_serial'))
print('Obj', prob.get_val('out_serial'))
What I am trying to understand is
How to use 'fd' or 'cs' in Distributed component with a serial output?
What is the meaning of prob.setup(force_alloc_complex=True) ? Is not forcing to use cs in all the components in the problem ? If so why does the total derivative becomes 0?
When I run your code in OpenMDAO V 3.11.0 (after uncommenting the declare_partials call) I get the following error:
RuntimeError: 'D1' <class MixedDistrib2>: component has defined partial ('out_serial', 'in_dist') which is a serial output wrt a distributed input. This is only supported using the matrix free API.
As the error indicates, you can't use the matrix-based api for derivatives in this situations. The reasons why are a bit subtle, and probably outside the scope of what needs to be delt with to answer your question here. It boils down to OpenMDAO not knowing why kind of distributed operations are being done in the compute and having no way to manage those details when you propagate things in reverse.
So you need to use the matrix-free derivative APIs in this situation. When you use the matrix-free APIs you DO NOT declare any partials, because you don't want OpenMDAO to allocate any memory for you to store partials in (and you wouldn't use that memory even if it did).
I've coded them for your example here, but I need to note a few important details:
Your example has a distributed IVC, but as of OpenMDAO V3.11.0 you can't get total derivatives with respect to distributed design variables. I assume you just made it that way to make your simple test case, but in case your real problem was set up this way, you need to note this and not do it this way. Instead, make the IVC serial, and use src indices to distribute the correct parts to each proc.
In the example below, the derivatives are correct. However, there seems to be a bug in the check_partials output when running in paralle. So the reverse mode partials look like they are off by a factor of the comm size... this will have to get fixed in later releases.
I only did the derivatives for out_serial. out_dist will work similarly and is left as an excersize for the reader :)
You'll notice that I duplicates some code in the compute and compute_jacvec_product methods. You can abstract this duplicate code out into its own method (or call compute from within compute_jacvec_product by providing your own output dictionary). However, you might be asking why the duplicate call is needed at all? Why can't u store the values from the compute call. The answer is, in large part, that OpenMDAO does not guarantee that compute is always called before compute_jacvec_product. However, I'll also point out that this kind of code duplication is very AD-like. Any AD code will have the same kind of duplication built in, even though you don't see it.
import numpy as np
import openmdao.api as om
from openmdao.utils.array_utils import evenly_distrib_idxs
from openmdao.utils.mpi import MPI
class MixedDistrib2(om.ExplicitComponent):
def setup(self):
# Distributed Input
self.add_input('in_dist', shape_by_conn=True, distributed=True)
# Serial Input
self.add_input('in_serial', val=1)
# Distributed Output
self.add_output('out_dist', copy_shape='in_dist', distributed=True)
# Serial Output
self.add_output('out_serial', copy_shape='in_serial')
# self.declare_partials('*','*', method='fd')
def compute(self, inputs, outputs):
x = inputs['in_dist']
y = inputs['in_serial']
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
outputs['out_dist'] = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
outputs['out_serial'] = g_y * total_sum
# Recommended to make sure your code can run in serial too, for testing.
outputs['out_serial'] = g_y * np.sum(g_x)
def compute_jacvec_product(self, inputs, d_inputs, d_outputs, mode):
x = inputs['in_dist']
y = inputs['in_serial']
g_y = y**2 + 3.0*y - 5.0
# "Computationally Intensive" operation that we wish to parallelize.
f_x = x**2 - 2.0*x + 4.0
# These operations are repeated on all procs.
f_y = y ** 0.5
g_y = y**2 + 3.0*y - 5.0
# Compute square root of our portion of the distributed input.
g_x = x ** 0.5
# Distributed output
out_dist = f_x + f_y
# Serial output
if MPI and comm.size > 1:
# We need to gather the summed values to compute the total sum over all procs.
local_sum = np.array(np.sum(g_x))
total_sum = local_sum.copy()
self.comm.Allreduce(local_sum, total_sum, op=MPI.SUM)
# total_sum
# Recommended to make sure your code can run in serial too, for testing.
total_sum = np.sum(g_x)
num_x = len(x)
d_f_x__d_x = np.diag(2*x - 2.)
d_f_y__d_y = np.ones(num_x)*0.5*y**-0.5
d_g_y__d_y = 2*y + 3.
d_g_x__d_x = 0.5*x**-0.5
d_out_dist__d_x = d_f_x__d_x # square matrix
d_out_dist__d_y = d_f_y__d_y # num_x,1
d_out_serial__d_y = d_g_y__d_y # scalar
d_out_serial__d_x = g_y*d_g_x__d_x.reshape((1,num_x))
if mode == 'fwd':
if 'out_serial' in d_outputs:
if 'in_dist' in d_inputs:
d_outputs['out_serial'] +=['in_dist'])
if 'in_serial' in d_inputs:
d_outputs['out_serial'] +=['in_serial'])
elif mode == 'rev':
if 'out_serial' in d_outputs:
if 'in_dist' in d_inputs:
d_inputs['in_dist'] +=['out_serial'])
if 'in_serial' in d_inputs:
d_inputs['in_serial'] += total_sum*['out_serial'])
size = 7
if MPI:
rank = comm.rank
sizes, offsets = evenly_distrib_idxs(comm.size, size)
# When running in serial, the entire variable is on rank 0.
rank = 0
sizes = {rank : size}
offsets = {rank : 0}
prob = om.Problem()
model = prob.model
# Create a distributed source for the distributed input.
ivc = om.IndepVarComp()
ivc.add_output('x_dist', np.zeros(sizes[rank]), distributed=True)
ivc.add_output('x_serial', val=1)
model.add_subsystem("indep", ivc)
model.add_subsystem("D1", MixedDistrib2())
model.add_subsystem('con_cmp1', om.ExecComp('con1 = y**2'), promotes=['con1', 'y'])
model.connect('indep.x_dist', 'D1.in_dist')
model.connect('indep.x_serial', ['D1.in_serial','y'])
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
model.add_design_var('indep.x_serial', lower=5, upper=10)
model.add_constraint('con1', upper=90)
# Set initial values of distributed variable.
x_dist_init = np.ones(sizes[rank])
prob.set_val('indep.x_dist', x_dist_init)
# Set initial values of serial variable.
prob.set_val('indep.x_serial', 10)
# prob.run_driver()
print('x_dist', prob.get_val('indep.x_dist', get_remote=True))
print('x_serial', prob.get_val('indep.x_serial'))
print('Obj', prob.get_val('D1.out_serial'))

Julia MethodError: no method matching parseNLExpr_runtime(

I'm attempting to code the method described here to estimate production functions of metal manufacturers. I've done this in Python and Matlab, but am trying to learn Julia.
spain_clean.csv is a dataset of log capital (lnk), log labor (lnl), log output (lnva), and log materials (lnm) that I am loading. Lagged variables are denoted with an "l" before them.
Code is at the bottom. I am getting an error:
ERROR: LoadError: MethodError: no method matching parseNLExpr_runtime(::JuMP.Model, ::JuMP.GenericQuadExpr{Float64,JuMP.Variable}, ::Array{ReverseDiffSparse.NodeData,1}, ::Int32, ::Array{Float64,1})
I think it has to do with the use of vector sums and arrays going into the non-linear objective, but I do not understand Julia enough to debug this.
using JuMP # Need to say it whenever we use JuMP
using Clp, Ipopt # Loading the GLPK module for using its solver
using CSV # csv reader
# read data
df ="spain_clean.csv")
acf = Model(solver=IpoptSolver())
#variable(acf, -10<= b0 <= 10) #
#variable(acf, -5 <= bk <= 5 ) #
#variable(acf, -5 <= bl <= 5 ) #
#variable(acf, -10<= g1 <= 10) #
const g = sum(df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl]))
const gllnk = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:llnk])
const gllnl = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:llnl])
const glphihat = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:lphihat])
#NLobjective(acf, Min, g* g + gllnk* gllnk + gllnl* gllnk + glphihat* glphihat)
status = solve(acf) # solves the model
println("Objective value: ", getobjectivevalue(acf)) # getObjectiveValue(model_name) gives the optimum objective value
println("b0 = ", getvalue(b0))
println("bk = ", getvalue(bk))
println("bl = ", getvalue(bl))
println("g1 = ", getvalue(g1))
No an expert in Julia, but I think a couple of things are wrong about your code.
first, constant are not supposed to change during iteration and you are making them functions of control variables. Second, what you want to use there are nonlinear expression instead of constants. so instead of the constants what you want to write is
N = size(df, 1)
#NLexpression(acf, g, sum(df[i, :phihat]-b0-bk* df[i, :lnk]-bl* df[i, :lnl]-g1* (df[i, :lphihat]-b0-bk* df[i, :llnk]-bl* df[i, :llnl]) for i=1:N))
#NLexpression(acf, gllnk, sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:llnk] for i=1:N))
#NLexpression(acf,gllnl,sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:llnl] for i=1:N))
#NLexpression(acf,glphihat,sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:lphihat] for i=1:N))
I tested this and it seems to work.

Error when implementing RBF kernel bandwidth differentiation in Pytorch

I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
def forward(ctx, input):
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
return input.clamp(min=0)
def backward(ctx, grad_output):
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
# Use autograd to compute the backward pass.
# Update weights using gradient descent -= learning_rate * -= learning_rate *
# Manually zero the gradients after updating weights
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)
Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.

vectorize complex slicing with pandas dataframe

I'd like to be able to vectorize, for speed purposes, this piece of code. the purpose is to calculate a function, in this case a standard deviation, from a tuple of pair of dates that are cointained in two separate arrays.
import pandas as pd
import numpy as np
asd_1 = pd.Series(0.01 * np.random.randn(252), index=pd.date_range('2011-1-1', periods=252))
index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1',])
index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17',])
index_tot = list(zip(index_1,index_2))
aux_learning_std = pd.DataFrame([np.nanstd(asd_1.loc[i:j]) for i, j in index_tot], index=index_1)
the solution, that works, is performed through a loop but i'd rather be able to vectorize it through numpy/pandas, which is much faster. initially I though about using something like:
df_aux = pd.concat([asd_1 for _ in range(len(index_1))], axis=1)
results = df_aux.apply(lambda x: np.nanstd(x.loc[i,j]), axis = 0)
but here I fail to put together the vectors into one operation.
any and all advice is welcome.
p.s.: below there is an image for explanatory purposes
Vectorized standard deviation across ranges in an array
def get_ranges_arr(starts,ends):
# Taken from
counts = ends - starts
counts_csum = counts.cumsum()
id_arr = np.ones(counts_csum[-1],dtype=int)
id_arr[0] = starts[0]
id_arr[counts_csum[:-1]] = starts[1:] - ends[:-1] + 1
return id_arr.cumsum()
def ranged_std(arr,starts,ends):
# Get all indices and the IDs corresponding to same groups
idx = get_ranges_arr(starts,ends)
id_arr = np.repeat(np.arange(starts.size),ends-starts)
# Extract relevant data
slice_arr = arr[idx]
# Simulate standard deviation implementation for a number of groups
# using id_arr as the basis to perform various mathematical operations
# within each group. Since, std. deviation performs sum/mean reduction,
# we can simply use np.bincount for an efficient implementation.
# Std. deviation formula used :
grp_counts = np.bincount(id_arr)
mean_vals = np.bincount(id_arr,slice_arr)/grp_counts
abs_vals = np.abs(slice_arr - mean_vals[id_arr])**2
return np.sqrt(np.bincount(id_arr,abs_vals)/grp_counts)
Sample run (verify against a loopy version)
In [173]: arr = np.random.randint(0,9,(20))
In [174]: starts = np.array([2,6,11])
In [175]: ends = np.array([8,9,15])
In [176]: [np.std(arr[i:j]) for i,j in zip(starts,ends)]
Out[176]: [1.9720265943665387, 0.81649658092772603, 0.82915619758884995]
In [177]: ranged_std(arr,starts,ends)
Out[177]: array([ 1.97202659, 0.81649658, 0.8291562 ])
Runtime test
Case #1 : Very small number of ranges 3
In [21]: arr = np.random.randint(0,9,(20))
In [22]: starts = np.array([2,6,11])
In [23]: ends = np.array([8,9,15])
In [24]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
10000 loops, best of 3: 146 µs per loop
In [25]: %timeit ranged_std(arr,starts,ends)
10000 loops, best of 3: 45 µs per loop
Case #2 : Decent number of ranges 1000
In [32]: arr = np.random.randint(0,9,(1010))
In [33]: starts = np.random.randint(0,9,(1000))
In [34]: ends = starts + np.random.randint(0,9,(1000))
In [35]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
10 loops, best of 3: 47.5 ms per loop
In [36]: %timeit ranged_std(arr,starts,ends)
1000 loops, best of 3: 217 µs per loop
Case #3 : Large number of ranges 10000
In [60]: arr = np.random.randint(0,9,(1010))
In [61]: arr = np.random.randint(0,9,(10010))
In [62]: starts = np.random.randint(0,9,(10000))
In [63]: ends = starts + np.random.randint(0,9,(10000))
In [64]: %timeit [np.std(arr[i:j]) for i,j in zip(starts,ends)]
1 loops, best of 3: 474 ms per loop
In [65]: %timeit ranged_std(arr,starts,ends)
100 loops, best of 3: 2.17 ms per loop
Really amazing speedups of 200x+!
Using ranged_std to solve our case
# Get start, stop numeric indices as needed for getting ranges array later on
starts = asd_1.index.searchsorted(index_1)
ends = asd_1.index.searchsorted(index_2)
# Create final dataframe output using ranged_std func
df = pd.DataFrame(ranged_std(asd_1.values,starts,ends+1),index=index_1)
Sample run for verification -
In [17]: asd_1 = pd.Series(0.01 * np.random.randn(252), index=\
...: pd.date_range('2011-1-1', periods=252))
...: index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1',])
...: index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17',])
...: index_tot = list(zip(index_1,index_2))
...: aux_learning_std = pd.DataFrame([np.nanstd(asd_1.loc[i:j]) for i, j in \
...: index_tot], index=index_1)
In [18]: starts = asd_1.index.searchsorted(index_1)
...: ends = asd_1.index.searchsorted(index_2)
...: df = pd.DataFrame(ranged_std(asd_1.values,starts,ends+1),index=index_1)
In [19]: aux_learning_std
2011-02-02 0.007244
2011-04-03 0.012862
2011-05-01 0.010155
In [20]: df
2011-02-02 0.007244
2011-04-03 0.012862
2011-05-01 0.010155
