Using spaCy for finetuning a NER model in spaCy 3.0 - spacy-3

I want to use en_core_web_trf model in spaCy library for Named entity recognition. However, the guide for training a custom model does not contain information for finetuning a pretrained model.
How can one finetune an NER model in spaCy v3.0?

It's recommended you train your NER component from scratch rather than fine-tuning the existing model, because fine-tuning the existing model is prone to catastrophic forgetting. Note that even if your NER component is trained from scratch, you're still using the Transformer as a starting point, so you aren't starting from nothing.
More details about why not to re-train the existing NER, and how to do it if you want to, are in the FAQ. There are also many threads about this topic in the Discussion in general.

Related

How do I implement a knowledge base in a Huggingface model?

I made a knowledge base using COMET on the Atomic knowledge graph, using this tutorial.
I would like to include this knowledge in a regular pre-trained BERT model from HuggingFace to see how the model with access to this knowledge performs on a different task (sentiment analysis).
I saved the generated tuples from COMET in a pickle file.
Thanks!

Can I update the spacy's Entity Linking knowledge base after training?

Let's suppose I have successfully trained an Entity Linking model, and it is working just fine. But, eventually, I'm going to update some aliases of the knowledge base. Just some aliases not the description nor new entities.
I know that spacy has a method to do so which is: kb.add_alias(alias="Emerson", entities=qids, probabilities=probs). But, what if I have to do that after the training process? Should I re-run everything, or updating the KB will do?
Best thing is to try it and see.
If you're just adding new aliases, it really depends on how much they overlap with existing aliases. If there's no overlap it won't make any difference, but if there is overlap that could have resulted in different evaluations in training, which could modify the model. Whether those differences are significant or not is hard to say.

Transferring the hidden state of a RNN to another RNN

I am using Reinforcement Learning to teach an AI an Austrian Card Game with imperfect information called Schnapsen. For different states of the game, I have different neural networks (which use different features) that calculate the value/policy. I would like to try using RNNs, as past actions may be important to navigate future decisions.
However, as I use multiple neural networks, I somehow need to constantly transfer the hidden state from one RNN to another one. I am not quite able to do that, especially during training I don't know how to make backpropagation through time work. I am grateful for any advice or links to related papers/blogs!
I am currently working with Flux in Julia, but I am also willing to switch to Tensorflow or Pytorch in Python.
Thank you in advance!

How to build a model from scratch in FastAI.jl

I would like to define my own model in FastAI.jl rather than use one of the pre-trained ones (of which it looks like there is only one or two available per the source code). I was reading through the docs and it seems to reference a pre-trained resent model everywhere a model is required and I don't see a section about defining my own models. Is it possible to do this in FastAI.jl and if so, how?

reusable holdout in mlr

How can someone change the cross validation or holdout procedures in mlr so that before testing with the validation set, that same validation set is changed according to a procedure, namely the reusable holdout procedure?
Procedure:
http://insilico.utulsa.edu/wp-content/uploads/2016/10/Dwork_2015_Science.pdf
Short answer: mlr doesn't support that.
Long answer: My experience with differential privacy for machine learning is that in practice it doesn't work as well as advertised. In particular, to apply thresholdout you need a) copious amounts of data and b) the a priori probability that a given classifier will overfit on the given data -- something you can't easily determine in practice. While the paper you reference comes with example code that shows that thresholdout works in this particular case, but the amount of noise added in the code looks like it was determined on an ad-hoc basis; the relationship to the thresholdout algorithm described in the paper isn't clear.
Before differential privacy can be robustly applied in practice in scenarios like that, mlr won't support it.

Resources