Initialize HuggingFace Bert with random weights - bert-language-model

How is it possible to initialize BERT with random weights? I want to compare the performance of multilingual vs monolingual vs randomly initialized BERT in a masked language modeling task. While in the former cases it is very straightforward:
from transformers import BertTokenizer, BertForMaskedLM
tokenizer_multi = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
model_multi = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
model_multi.eval()
tokenizer_mono = BertTokenizer.from_pretrained('bert-base-cased')
model_mono = BertForMaskedLM.from_pretrained('bert-base-cased')
model_mono.eval()
I don't know how to load random weights.
Thanks in advance!

You can use the following function:
def randomize_model(model):
for module_ in model.named_modules():
if isinstance(module_[1],(torch.nn.Linear, torch.nn.Embedding)):
module_[1].weight.data.normal_(mean=0.0, std=model.config.initializer_range)
elif isinstance(module_[1], torch.nn.LayerNorm):
module_[1].bias.data.zero_()
module_[1].weight.data.fill_(1.0)
if isinstance(module_[1], torch.nn.Linear) and module_[1].bias is not None:
module_[1].bias.data.zero_()
return model

Related

BERT attribution scores for token probability prediction

I've been trying to find a library or an example for getting token importance when a BERT model predicts a masked span, eg:
from transformers import BertTokenizerFast, BertForMaskedLM
import torch
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
text = 'Brad Pitt is an [MASK] actor.'
tokenized_text = tokenizer.tokenize(text)
masked_index = tokenized_text.index("[MASK]")
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
# Predict all tokens
with torch.no_grad():
outputs = model(tokens_tensor)
predictions = outputs[0]
probs = torch.nn.functional.softmax(predictions[0, masked_index], dim=-1)
You could then pick the highest predicted value, or the 5 top values.
How would I go about calculating, let's say vanilla gradients or any other type or saliency method and see which tokens where important when predicting the masked token?
I read Ecco's documentation but they don't support attribution for BERT yet, AllenNLP has a demo for MLM task, but it's only for that demo, and I couldn't find anything relevant using SHAP or Captum.
Any help pointing to the right direction woudl be appreciated.

graph isomorphism neural network

I am trying to understand graph isomorphism network and graph attention network through PyTorch (GIN) and GAT for some classification tasks.
however, I can't find already implemented projects to read and understand as hints.
there are some for GCN and they are ok.
I wanted to know if anyone can suggest any kind of material except raw theoretical papers so I can refer to.
Graph Isomorphism networks (GIN) can be built using Tensorflow and spektral libraries.
Here is an example of GIN network built using above mentioned libraries:
class GIN0(Model):
def __init__(self, channels, n_layers):
super().__init__()
self.conv1 = GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
self.convs = []
for _ in range(1, n_layers):
self.convs.append(
GINConv(channels, epsilon=0, mlp_hidden=[channels, channels])
)
self.pool = GlobalAvgPool()
self.dense1 = Dense(channels, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.conv1([x, a])
for conv in self.convs:
x = conv([x, a])
x = self.pool([x, i])
return self.dense1(x)
You can use this model for training and testing just like any other tensorflow model with some limitations.

Pytorch Geometric Graph Classification : AttributeError: 'Batch' object has no attribute 'local_var'

I am currently working on doing graph classification on the IMDB-Binary dataset using deep learning and specifically the pytorch geometric environment.
I have split my data into test/train samples that are list of tuples containing a graph and its label. One thing I've had to do is to treat the different graph as a "Batch", a large disconnected graph, using torch_geometric.data.Batch. To start, I am using a data loader with the following collate function
def collate(samples) :
graphs,labels = map(list,zip(*samples))
datalist = make_datalist(graphs)
datalist = Batch.from_data_list(datalist)
return datalist, torch.tensor(labels)
and my classifier is the following :
class Classifier(nn.Module):
def __init__(self, in_dim, hidden_dim, n_classes):
super(Classifier, self).__init__()
self.conv1 = GraphConv(in_dim, hidden_dim)
self.conv2 = GraphConv(hidden_dim, hidden_dim)
self.classify = nn.Linear(hidden_dim, n_classes)
def forward(self, g):
# Use node degree as the initial node feature. For undirected graphs, the in-degree
# is the same as the out_degree.
h = g.in_degrees
# Perform graph convolution and activation function.
h = F.relu(self.conv1(g, h))
h = F.relu(self.conv2(g, h))
g.ndata['h'] = h
# Calculate graph representation by averaging all the node representations.
hg = dgl.mean_nodes(g, 'h')
return self.classify(hg)
Which simply averages the nodes representations of each graph, and feeds it to a MLP
The problem I come up with is that during the prediction of our batch, I have the error
AttributeError: 'Batch' object has no attribute 'local_var'
and I can't find where it may come from, would anyone know ?
Thank you for taking the time to read !
I am also experimenting with Pytorch geometric and its' data set capabilities.
Maybe following information will help someone in the future:
I'm facing AttributeErrors when forgetting to set #property annotated getters/setters for my data set class attributes. See https://docs.python.org/3.7/library/functions.html#property
I think to answer your question we need more information about your make_datalist function.
However, here are the links to the batch class:
https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html
https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/data/batch.html#Batch
And indeed, there is nothing like a local_var variable.

Pipeline' object has no attribute 'feature_importances_

I have a problem with my code, I want to see the feature importance about vector from word2vec model, but I can't beacause it's a pipeline. Someone could help me to find a solution please ?
## Import the random forest model.
from sklearn.ensemble import RandomForestClassifier
## This line instantiates the model.
rf = Pipeline([
("word2vec vectorizer", MeanEmbeddingVectorizer(w2v)),
("Random_forest", RandomForestClassifier(n_estimators=100, max_depth=6,random_state=0))])
## Fit the model on your training data.
rf.fit(X_train, y_train)
## And score it on your testing data.
rf.score(X_test, y_test)
X = model.wv.syn0
X = X.astype(int)
def plot_feat_imp(model, X):
Feature_Imp = pd.DataFrame([X, rand_w2v_tfidf.feature_importances_]).transpose(
).sort_values(1, ascending=False)
plt.figure(figsize=(14, 7))
sns.barplot(y=Feature_Imp.loc[:, 0], x=Feature_Imp.loc[:, 1], data=Feature_Imp, orient='h')
plt.title("Importance des variables (qu'est ce qui explique le mieux la satisfaction)", fontsize=21)
plt.show()
return
MY PROBLEM IS HERE
AttributeError: 'Pipeline' object has no attribute 'feature_importances_'
plot_feat_imp(gbc_w2v, X)
Maybe not the answer you were seeking for, but if you want the feature_importances_ of your pipeline object you might want to first get into the best classifier.
This is possible with:
rf_fit = rf.fit(X_train, y_train)
feature_importances = rf_fit.best_estimator_._final_estimator.feature_importances_
Hope that helps.

Python equivalent to R caTools random 'sample.split'

Is there a Python (perhaps pandas) equivalent to R's
install.packages("caTools")
library(caTools)
set.seed(88)
split = sample.split(df$col, SplitRatio = 0.75)
that will generate exactly the same value split?
My current context for this is, as an example getting Pandas dataframes that correspond exactly to the R dataframes (qualityTrain, qualityTest) created by:
# https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv
quality = read.csv("quality.csv")
set.seed(88)
split = sample.split(quality$PoorCare, SplitRatio = 0.75)
qualityTrain = subset(quality, split == TRUE)
qualityTest = subset(quality, split == FALSE)
I think scikit-learn's train_test_split function might work for you (link).
import pandas as pd
from sklearn.cross_validation import train_test_split
url = 'https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv'
quality = pd.read_csv(url)
train, test = train_test_split(quality, train_size=0.75, random_state=88)
qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)
Unfortunately I don't get the same rows as the R function. I'm guessing it's the seeding, but could be wrong.
Splitting with sample.split from caTools library means the class distribution is preserved. Scikit-learn method train_test_split does not guarantee that (it splits dataset into a random train and test subsets).
You can get equivalent result as R caTools library (regarding class distribution) by using instead sklearn.cross_validation.StratifiedShuffleSplit
sss = StratifiedShuffleSplit(quality['PoorCare'], n_iter=1, test_size=0.25, random_state=0)
for train_index, test_index in sss:
qualityTrain = quality.iloc[train_index,:]
qualityTest = quality.iloc[test_index,:]
I know this is an old thread but I just found it looking for any potential solution because for a lot of online classes in stats and machine learning that are taught in R, if you want to use Python you run into this issue where all the classes say to do a set.seed() in R and then you use something like the caTools sample.split and you must get the same split or your result won't be the same later and you can't get the right answer for some quiz or exercise question. One of the main issues is that although both Python and R use, by default, the Mercenne Twister algorithm for their pseudo-random number generation, I discovered, by looking at the random states of their respective prngs, that they won't produce the same result given the same seed. And one (I forget which) is using signed numbers and the other unsigned, so it seems like there's little hope that you could find a seed to use with Python that would produce the same series of numbers as R.
A small correction in the above, StatifiedShuffleSplit is now part of sklearn.model_selection.
I have a some data with X and Y in different numpy arrays. The distribution of 1s against 0s in my Y array is about 4.1%. If I use StatifiedShuffleSplit it maintains this distribution in test and train set made after wards. See below.
full_data_Y_np.sum() / len(full_data_Y_np)
0.041006701187937859
for train_index, test_index in sss.split(full_data_X_np, full_data_Y_np):
X_train = full_data_X_np[train_index]
Y_train = full_data_Y_np[train_index]
X_test = full_data_X_np[test_index]
Y_test = full_data_Y_np[test_index]
Y_train.sum() / len(Y_train)
0.041013925152306355
Y_test.sum() / len(Y_test)
0.040989847715736043

Resources