graph isomorphism neural network - graph

I am trying to understand graph isomorphism network and graph attention network through PyTorch (GIN) and GAT for some classification tasks.
however, I can't find already implemented projects to read and understand as hints.
there are some for GCN and they are ok.
I wanted to know if anyone can suggest any kind of material except raw theoretical papers so I can refer to.

Graph Isomorphism networks (GIN) can be built using Tensorflow and spektral libraries.
Here is an example of GIN network built using above mentioned libraries:
class GIN0(Model):
def __init__(self, channels, n_layers):
self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.convs = []
for _ in range(1, n_layers):
GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.pool = GlobalAvgPool()
self.dense1 = Dense(channels, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.conv1([x, a])
for conv in self.convs:
x = conv([x, a])
x = self.pool([x, i])
return self.dense1(x)
You can use this model for training and testing just like any other tensorflow model with some limitations.


How to optimize with differential evolution using julia package Evolutionary.jl?

I encountered such problem after I specified a differential evolution algorithm and an initial population of multiplied layer perceptron network. It requires to evolve a population of MLPs by DE. I tried to use Evolutionary package, but failed at this problem. I am just a beginner of julia. Can anyone help me with this problem? Or if there is any other way to implement a DE to evolve MLPs? Because I don't know much how to reuse codes if I don't see any similar example, I can't find any example of julia to evolve MLP by DE. The codes are attached as follow.
//Here are the snippets of codes
features = Iris.features();
slabels = Iris.labels();
classes = unique(slabels) # unique classes in the dataset
nclasses = length(classes) # number of classes
d, n = size(features) # dimension and size if the dataset
//define MLP
model = Chain(Dense(d, 15, relu), Dense(15, nclasses))
//rewrite initial_population to generate a group of MLPs
import Evolutionary.initial_population
function initial_population(method::M, individual::Chain;
kwargs...) where {M<:Evolutionary.AbstractOptimizer}
θ, re = Flux.destructure(individual);
[re(randn(rng, length(θ))) for i in 1:Evolutionary.population_size(method)]
//define DE algorithm and I just used random parameters
algo2 = DE(
selection = rouletteinv
popu = initial_population(algo2, model)
//in the source code of Evolutionary.jl, it seems that to use optimize() function, I need to pass a constranit? I am not sure. I have tried every method of optimize function, but it still reported error. What's worse, I am not sure how to use box constraint, so I tried to use Nonconstranit constraint, but it still failed. I don't know how to set upper and lower bounds of box constraint in this case, so I don't know how to use it. and I tried to set a random box constraint to try to run optimize() function, but it still failed. error reported is in pitcure attached.
cnst = BoxConstraints([0.5, 0.5], [2.0, 2.0])
res2 = Evolutionary.optimize(fitness,cnst,algo2,popu,opts)
//so far what I do is simply define a DE algorithm, an initial population, a MLP network and there is a uniform_mlp(), which is used to deconstruct a mlp into a vector, perform crossover operator and reconstruct from them a new mlp
function uniform_mlp(m1::T, m2::T; rng::Random.AbstractRNG=Random.default_rng()) where {T <: Chain}
θ1, re1 = Flux.destructure(m1);
θ2, re2 = Flux.destructure(m2);
c1, c2 = UX(θ1,θ2; rng=rng)
return re1(c1), re2(c2)
//there is also a mutation function
function gaussian_mlp(σ::Real = 1.0)
vop = gaussian(σ)
function mutation(recombinant::T; rng::Random.AbstractRNG=Random.default_rng()) where{T <: Chain}
θ, re = Flux.destructure(recombinant)
return re(convert(Vector{Float32}, vop(θ; rng=rng)))
return mutation
The easiest way to use this is through Optimization.jl. There is an Evolutionary.jl wrapper that makes it use the standardized Optimization.jl interface. This looks like:
using Optimization, OptimizationEvolutionary
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
x0 = zeros(2)
p = [1.0, 100.0]
f = OptimizationFunction(rosenbrock)
prob = Optimization.OptimizationProblem(f, x0, p, lb = [-1.0,-1.0], ub = [1.0,1.0])
sol = solve(prob, Evolutionary.DE())
Though given previous measurements of global optimizer performance, we would recommend BlackBoxOptim's methods as well, this can be changed through simply by changing the optimizer dispatch:
using Optimization, OptimizationBBO
sol = solve(prob, BBO_adaptive_de_rand_1_bin_radiuslimited(), maxiters=100000, maxtime=1000.0)
This is also a DE method, but one with some adaptive radius etc. etc. that performs much better (on average).

Initialize HuggingFace Bert with random weights

How is it possible to initialize BERT with random weights? I want to compare the performance of multilingual vs monolingual vs randomly initialized BERT in a masked language modeling task. While in the former cases it is very straightforward:
from transformers import BertTokenizer, BertForMaskedLM
tokenizer_multi = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
model_multi = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
tokenizer_mono = BertTokenizer.from_pretrained('bert-base-cased')
model_mono = BertForMaskedLM.from_pretrained('bert-base-cased')
I don't know how to load random weights.
Thanks in advance!
You can use the following function:
def randomize_model(model):
for module_ in model.named_modules():
if isinstance(module_[1],(torch.nn.Linear, torch.nn.Embedding)):
module_[1], std=model.config.initializer_range)
elif isinstance(module_[1], torch.nn.LayerNorm):
if isinstance(module_[1], torch.nn.Linear) and module_[1].bias is not None:
return model

Pytorch Geometric Graph Classification : AttributeError: 'Batch' object has no attribute 'local_var'

I am currently working on doing graph classification on the IMDB-Binary dataset using deep learning and specifically the pytorch geometric environment.
I have split my data into test/train samples that are list of tuples containing a graph and its label. One thing I've had to do is to treat the different graph as a "Batch", a large disconnected graph, using To start, I am using a data loader with the following collate function
def collate(samples) :
graphs,labels = map(list,zip(*samples))
datalist = make_datalist(graphs)
datalist = Batch.from_data_list(datalist)
return datalist, torch.tensor(labels)
and my classifier is the following :
class Classifier(nn.Module):
def __init__(self, in_dim, hidden_dim, n_classes):
super(Classifier, self).__init__()
self.conv1 = GraphConv(in_dim, hidden_dim)
self.conv2 = GraphConv(hidden_dim, hidden_dim)
self.classify = nn.Linear(hidden_dim, n_classes)
def forward(self, g):
# Use node degree as the initial node feature. For undirected graphs, the in-degree
# is the same as the out_degree.
h = g.in_degrees
# Perform graph convolution and activation function.
h = F.relu(self.conv1(g, h))
h = F.relu(self.conv2(g, h))
g.ndata['h'] = h
# Calculate graph representation by averaging all the node representations.
hg = dgl.mean_nodes(g, 'h')
return self.classify(hg)
Which simply averages the nodes representations of each graph, and feeds it to a MLP
The problem I come up with is that during the prediction of our batch, I have the error
AttributeError: 'Batch' object has no attribute 'local_var'
and I can't find where it may come from, would anyone know ?
Thank you for taking the time to read !
I am also experimenting with Pytorch geometric and its' data set capabilities.
Maybe following information will help someone in the future:
I'm facing AttributeErrors when forgetting to set #property annotated getters/setters for my data set class attributes. See
I think to answer your question we need more information about your make_datalist function.
However, here are the links to the batch class:
And indeed, there is nothing like a local_var variable.

Usage of tf.keras within R

I want to convert my articial neural network implementations to the new tensorflow 2 platform, where keras is an implicit part of (tf.keras). Are there any recommended sources that explain the implementation of ANNs using tensorflow 2/tf.keras within R?
Furthermore, why there is an extra keras package from F. Chollet available, when keras is as mentioned an implicit part of tensorflow now?
Sorry guys maybe for such basic questions, but my own searches were unfortunately not crowned with success.
From original tensorflow documentation I extract the following Python code:
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
My own R conversions are
k <- tf$keras
l <- k$layers
input1 <- k$layers$Input(shape = c(16,?))
x1 <- k$layers$Dense(units = 8, activation = "relu") (input1)
input2 <- k$layers$Input(shape = c(32,?))
x2 <- k$layers$Dense(units = 8, activation = "relu") (input2)
added <- k$layers$add(inputs = c(x1,x2))
My question hopefully seems not to be too stupid, but I've problems to implement a python tuple resp. scalar into its R equivalent. So my question: How must the shape argument in the input layers be converted into R?
I think the following page should provide the answer to your question:
In essence, your code should stay the same if you are using Keras with a version or above. For more details, refer to the linked site above.

BSpline combined with explicit and externalcode behaves differently

Below there is a sample code where the BSplineComp is combined either with an ExplicitComp or ExternalCodeComp.
Both of these two do the same calculation and both of the components' gradients are calculated using finite difference.
If I run the version Bspline+ExplicitComp the result is achieved within 2,3 iterations.
If I run the version Bspline+ExternalCodeComp I have to wait a lot. In this case it is trying to find the gradient of the output with respect to each input. So for example there are 9 control points that are interpolated to 70 points in the bspline component. Then the external component has to be evaluated as many as the interpolated points (70 times)
So in a case where the bspline is combined with an expensive external code the finite difference requires as much as the number of points it is interpolated to which becomes the bottleneck of the computation.
Based on this input I have two questions
1- If external code component is based on the explicit component what is the major difference that causes the behaviour difference? (considering both have an input of shape=70)
2- In the previously mentioned scneario where the bspline is combined with an expensive external code would there be a more efficient way of combining them apart from the way it is shown here.
MAIN CODE: 'external' variable is the flag for toggling external/explicit code comp. set that true/false for running the two cases explained above.
from openmdao.components.bsplines_comp import BsplinesComp
from openmdao.api import IndepVarComp, Problem, ExplicitComponent,ExecComp,ExternalCodeComp
from openmdao.api import ScipyOptimizeDriver, SqliteRecorder, CaseReader
import matplotlib.pyplot as plt
import numpy as np
external=True # change this to true for the case with external code comp. or false for the case with explicit comp.
"Explicit component for the area under the line calculation"
class AreaComp(ExplicitComponent):
def initialize(self):
self.options.declare('lenrr', int)
self.options.declare('rr', types=np.ndarray)
def setup(self):
self.add_input('h', shape=lenrr)
self.declare_partials(of='area', wrt='h', method='fd')
def compute(self, inputs, outputs):
rr = self.options['rr']
outputs['area'] = np.trapz(rr,inputs['h'])
class ExternalAreaComp(ExternalCodeComp):
def setup(self):
self.add_input('h', shape=70)
self.input_file = 'paraboloid_input.dat'
self.output_file = 'paraboloid_output.dat'
# providing these is optional; the component will verify that any input
# files exist before execution and that the output files exist after.
self.options['external_input_files'] = [self.input_file]
self.options['external_output_files'] = [self.output_file]
self.options['command'] = [
'python', '', self.input_file, self.output_file
# this external code does not provide derivatives, use finite difference
self.declare_partials(of='*', wrt='*', method='fd')
def compute(self, inputs, outputs):
h = inputs['h']
# generate the input file for the paraboloid external code
# the parent compute function actually runs the external code
super(ExternalAreaComp, self).compute(inputs, outputs)
# parse the output file from the external code and set the value of f_xy
outputs['area'] = f_xy
prob = Problem()
model = prob.model
n_cp = 9
lenrr = len(rr)
"Initialize the design variables"
x = np.random.rand(n_cp)
model.add_subsystem('px', IndepVarComp('x', val=x))
model.add_subsystem('interp', BsplinesComp(num_control_points=n_cp,
if external:
model.add_subsystem('AreaComp', comp)
comp = AreaComp(lenrr=lenrr, rr=rr)
model.add_subsystem('AreaComp', comp)
case_recorder_filename2 = 'cases4.sql'
recorder2 = SqliteRecorder(case_recorder_filename2)
model.connect('px.x', 'interp.h_cp')
model.connect('interp.h', 'AreaComp.h')
model.add_constraint('interp.h', lower=0.9, upper=1, indices=[0])
prob.driver = ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.driver.options['disp'] = True
#prob.driver.options['optimizer'] = 'COBYLA'
#prob.driver.options['disp'] = True
prob.driver.options['tol'] = 1e-9
model.add_design_var('px.x', lower=1,upper=10)
cr = CaseReader(case_recorder_filename2)
case_keys = cr.system_cases.list_cases()
for case_key in case_keys:
case = cr.system_cases.get_case(case_key)
The external code is below
import numpy as np
if __name__ == '__main__':
import sys
input_filename = sys.argv[1]
output_filename = sys.argv[2]
rk= np.trapz(rr,h)'a',np.array(rk))
In both cases your code takes 3 iterations to run. The wall time for the external code is much much longer simply because of the cost of file-io plus the requirement to make a system call to spool up a new process each time your function is called.
Yep, system calls are that expensive and file i/o isn't cheap either. If you have a more costly analysis its less of a big deal, but you can see why it should be avoided if at all possible.
In this case you can reduce your FD cost though. Since you have only 9 bspline variables, you have correctly deduced that you could run far fewer FD steps. You want to use the approximate semi-total derivative feature in OpenMDAO v2.4 to set up FD across the group instead of across each individual component.
Its as simple as this:
if external:
model.add_subsystem('AreaComp', comp)
comp = AreaComp(lenrr=lenrr, rr=rr)
model.add_subsystem('AreaComp', comp)
