AttentionDecoderRNN without MAX_LENGTH - recurrent-neural-network

From the PyTorch Seq2Seq tutorial, http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder
We see that the attention mechanism is heavily reliant on the MAX_LENGTH parameter to determine the output dimensions of the attn -> attn_softmax -> attn_weights, i.e.
class AttnDecoderRNN(nn.Module):
def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
super(AttnDecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.output_size = output_size
self.dropout_p = dropout_p
self.max_length = max_length
self.embedding = nn.Embedding(self.output_size, self.hidden_size)
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(self.hidden_size, self.hidden_size)
self.out = nn.Linear(self.hidden_size, self.output_size)
More specifically
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
I understand that the MAX_LENGTH variable is the mechanism to reduce the no. of parameters that needs to be trained in the AttentionDecoderRNN.
If we don't have a MAX_LENGTH pre-determined. What values should we initialize the attn layer with?
Would it be the output_size? If so, then that'll be learning the attention with respect to the full vocabulary in the target language. Isn't that the real intention of the Bahdanau (2015) attention paper?

Attention modulates the input to the decoder. That is attention modulates the encoded sequence which is of the same length as the input sequence. Thus, MAX_LENGTH should be the maximum sequence length of all your input sequences.

Related

Finite Element Analysis with Gridap.jl; how to define external forces?

I'm following this tutorial in order to try and do an FEA of a model.msh that I have to see how it would deform given different external forces in different places.
There they define the weak form as
a(u,v) = ∫( ε(v) ⊙ (σ∘ε(u)) )*dΩ
l(v) = 0
and they state "The linear form is simply l(v) = 0 since there are not external forces in this example."
As mentioned, I would like to analyse the different deformation that different external forces would cause on my model, but I can't seem to find anywhere an example of this. Could someone help me on defining this linear form for external forces different than 0?
Thanks.
Maybe this helps you out. It was written in a hurry, so please do not mind if you encounter spelling mistakes or other beauty issues :)
# define where the output shall go
output_Path ="Output/3_Gridap/1_Lin_FEA/FE_8"
mkpath(output_Path)
output_Name ="pde_6"
using Gridap
# please load the model that is shown in: https://gridap.github.io/Tutorials/dev/pages/t001_poisson/
model = DiscreteModelFromFile("Code/Meshes/Data/possion.json")
# just in case you want to see the model using paraview
writevtk(model,"$(output_Path)/md")
order = 1
reffe = ReferenceFE(lagrangian,VectorValue{3,Float64},order)
V0 = TestFESpace(model,reffe;
conformity=:H1,
# to see which elements belongs to "bottom" open the model which is saved through "writevtk(model,"$(output_Path)/md")"
dirichlet_tags=["bottom"],
# activate/deactivate the boundary conditions
dirichlet_masks=[
(true, true, true), # clamp the bottom
])
# define displacement
clamping(x) = VectorValue(0.0,0.0,0.0)
U = TrialFESpace(V0,[clamping])
const E = 7e+7
const ν = 0.33
const λ = (E*ν)/((1+ν)*(1-2*ν))
const μ = E/(2*(1+ν))
σ(ε) = λ*tr(ε)*one(ε) + 2*μ*ε
degree = 2*order
Ω = Triangulation(model)
dΩ = Measure(Ω,degree)
# Neumann boundary conditions
# here we define the surface on which we want an external force
Γ_Tr = BoundaryTriangulation(model,tags=["triangle"])
dΓ_Tr = Measure(Γ_Tr,degree)
# a force shall be applied on the y-direction
f_Tr(x) = VectorValue(0.0, 1e+6, 0.0)
# mass forces due to gravity, the value is set quite high, such that an impact can be seen
mass_Forces(x) = VectorValue(0.0, -1e+7, 0.0)
# Weak form
a(u,v) = ∫( ε(v) ⊙ (σ∘ε(u)) )*dΩ
l(v) = ∫( v ⋅ mass_Forces )* dΩ + ∫( v ⋅ f_Tr )* dΓ_Tr
op = AffineFEOperator(a,l,U,V0)
uh = solve(op)
writevtk(Ω,"$(output_Path)/$(output_Name)",
cellfields=[
"uh" => uh,
"epsi" => ε(uh),
"sigma" => σ∘ε(uh)])

How to implement self and __init__() in julia

I would like to know what is the correct approach to implement self and __inti__() in Julia?
Example
class rectangle:
def __init__(self, length, breadth, height):
self.length = length
self.breadth = breadth
self.height = height
def get_area(self):
return self.length * self.breadth
r = rectangle(160, 20, 1000)
print("area is", r.get_area())
I have tried this in Julia, but it does neither fits the operation expectation nor the results.
struct rectangle
length
breadth
height
end
function __init__(rectangle)
rectangle.length = length
rectangle.breadth = breadth
rectangle.height = height
end
function get_area(rectangle)
return rectangle.length*rectangle.breadth
end
data_obj = __init__()
r = get_area(data_obj)
end
Please do suggest an appropriate approach to achieve the python example in Julia.
Thanks in advance!!
A bold move to just literally translate from Python. It doesn't work that way, obviously.
However, the following should be enough:
struct Rectangle{T}
length::T
breadth::T
height::T
end
area(rectangle) = rectangle.length * rectangle.breadth
r = Rectangle(160, 20, 1000)
println(area(r))
(The type parameter is not something you asked for, but recommended.)
Now, if you need to do something more than simply assign the fields, you can write an outer constructor:
function Rectangle(l, b, h)
...
return Rectangle(l, b, h)
end
But there's no need for this unless some actual logic is required.

Given two lists(A,B) with same length, How can I find the index(i) which makes the max(sum(A[:i],B[i:]),sum(A[i:],B[:i])) smallest?

I am working on an online challenge problem, and I can solve this problem with brute force, but when the length became very large, the runtime is significantly increased, I believe there must be a better algorithm to solve this problem, but it is just out of my hand. I appreciate any brilliant ideas.
If you are allowed to use numpy, by using numpy.cumsum method you can find store sum(A[:i]), sum(A[i:]), sum(B[:i]), and sum(B[i:]) values in four different arrays as follows
import numpy as np
A = [] # Array A
B = [] # Array B
A_start_to_i = np.cumsum(A) # A_start_to_i[i] = sum(A[:i])
A.reverse() # Reverse the order
A_i_to_end = np.cumsum(A) # A_i_to_end[i] = sum(A[i:])
B_start_to_i = np.cumsum(B) # B_start_to_i[i] = sum(B[:i])
B.reverse() # Reverse the order
B_i_to_end = np.cumsum(B) # B_i_to_end = sum(B[i:])
Now all you need to do is to create sum(A[:i], B[i:]) and sum(B[:i], A[i:]) and find the index with the minimum element.
first_array = A_start_to_i + B_i_to_end # first_array[i] = sum(A[:i], B[i:])
second_array = A_i_to_end + B_start_to_i # second_array[i] = sum(B[:i], A[i:])
# Find which array has the minimum element
idx = np.argmin([min(first_array), min(second_array)])
if idx == 0:
# First array has the minimum element
i = np.argmin(first_array)
else:
# Second array has the minimum element
i = np.argmin(second_array)

Error when implementing RBF kernel bandwidth differentiation in Pytorch

I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
#staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
#staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(x.mm(w1)).mm(w2)
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)
Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.

How would you index a table that is being initialized?

An example of what I desire:
local X = {["Alpha"] = 5, ["Beta"] = this.Alpha+3}
print(X.Beta) --> error: [string "stdin"]:1: attempt to index global 'this' (a nil value)
is there a way to get this working, or a substitute I can use without too much code bloat(I want it to look presentable, so fenv hacks are out of the picture)
if anyone wants to take a crack at lua, repl.it is a good testing webpage for quick scripts
No there is no way to do this because the table does not yet exist and there is no notion of "self" in Lua (except via syntactic sugar for table methods). You have to do it in two steps:
local X = {["Alpha"] = 5}
X["Beta"] = X.Alpha+3
Note that you only need the square brackets if your key is not a string or if it is a string with characters other than any of [a-z][A-Z][0-9]_.
local X = {Alpha = 5}
X.Beta = X.Alpha+3
Update:
Based on what I saw on your pastebin, you probably should do this slightly differently:
local Alpha = 5
local X = {
Alpha = Alpha,
Beta = Alpha+3,
Gamma = someFunction(Alpha),
Eta = Alpha:method()
}
(obviously Alpha has no method because in the example it is a number but you get the idea, just wanted to show if Alpha were an object).

Resources