How can i use BERT fo machine Translation? - jupyter-notebook

I got a big problem. For my bachelor thesis I have to make a machine tranlation model with BERT.
But I am not getting anywhere right now.
Do you know a documentation or something that can help me here?
I have read some papers in that direction but maybe there is a documentation or tutorial that can help me.
For my bachelor thesis I have to translate from a summary of a text into a title.
I hope someone can help me.

BERT is not a machine translation model, BERT is designed to provide a contextual sentence representation that should be useful for various NLP tasks. Although there exist ways how BERT can be incorporated into machine translation (https://openreview.net/forum?id=Hyl7ygStwB), it is not an easy problem and there are doubts if it really pays off.
From your question, it seems that you are not really machine translation, but automatic summarization. Similarly to machine translation, it can be approached using sequence-to-sequence models, but we do not call it translation in NLP.
For sequence-to-sequence modeling, there are different pre-trained models, such as BART or MASS. These should be much more useful than BERT.
Update in September 2022: There are multilingual BERT-like models, the most famous are multilingual BERT and XLM-RoBERTa. When fine-tuned carefully, they can be used as a universal encoder for machine translation and enable so-called zero-shot machine translation. The model is trained to translate from several source languages into English, but in the end, it can translate from all languages covered by the multilingual BERT-like models. The method is called SixT.

Related

Theory of automata prerequisites

I'm interested in automata theory to improve my understanding of programming and compiler design (I would like to create some simple syntax's in my own projects , for example; L-Systems, AI, neural net structures and intelligent object-object conversation 'AI dialog') but there are things I need to learn before I go forward.
There are a lot of new symbols and mathematical concepts I need to learn before studying automata theory, I could not copy and paste examples because of the symbols and
I don't have the required reputation to post an image so hears a link to a wiki article.
Context-free grammar article on Wikipedia
Under the heading "Proper CFGs" you can see some definitions. I don't understand them.
Could someone please tell me what this notation is called so I can Google it. Any other pointers or information would also be helpful but just knowing a few key words will help. Also if anyone knows of a comprehensive resource that can be accessed for free e.g, an IIT Video lecture on the subject of that notation I would be eternally grateful as I
can't afford tutoring or even text books at this time.
The resource I'm using at the moment for automata theory(for anyone who is interested) is Theory of Automata IIT Lectures on YouTube.
The symbols ∀ and ∃ are logical quantifiers, respectively meaning "for all" and "there exists".
Typically you are first introduced to them in a discrete mathematics course, though they're a part of predicate logic (also known as first-order logic); in my particular university's CS program, Discrete Math is a pre-requisite for Logic for Computer Science, which in turn is a pre-requisite for Formal Languages and Automata.
The star * symbol in the term (V union Sigma)* there is studied in formal languages/automata theory itself: it is the Kleene star operator. Its input is an alphabet (a set of symbols), and it produces the set of all strings of zero or more symbols over that alphabet.
A useful tool for studying formal languages and automata is JFLAP.
This topic, at the level that you have referred to in your link, is really only for mathematicians or graduate-level theoretical computer science students. The symbols you are referring to are just symbolic logic. If you are really interested in automata theory, I would recommend trying to find resources that explore the topic from a conceptual level and avoid using complex logical statements. OR, if you really want to dive in, you can teach yourself symbolic logic, some set theory, probably some modern algebra, and then tackle automata theory from there.
I read many books on the subject of Languages and Automata, including the Dragon books on compilers (and the much more pragmatic Jack Crenshaw's Let's Write a Compiler), but none of it really clicked until I read the classic Finite and Infinite Machines by Marvin Minsky. Being an old book, it does not cover the latest research and developments in the field at all, but he explains the state-of-the-art for the 1960s in Automata, Neural Networks, Turing Machines, Functional Programming and Lambda Calculus, and the oft-neglected third wheel of String-Rewriting Systems. And the writing is exceptionally excellent and engaging. IIRC Minksy even co-authored a robot story with Isaac Asimov, so he has some serious writing credentials.
Like I say, this book will not bring you up-to-date in any of these fields, but it's the best book I've found for explaining everything from the ground up. And it would provide a very firm basis for reading anything more recent. This book is in the bibliography of every book published since.

Machine Learning and Natural Language Processing [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Assume you know a student who wants to study Machine Learning and Natural Language Processing.
What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems?
I am not looking for your favorite subjects and tools, but rather industry standards.
Example: I'm guessing that knowing Prolog and Matlab might help them. They also might want to study Discrete Structures*, Calculus, and Statistics.
*Graphs and trees. Functions: properties, recursive definitions, solving recurrences. Relations: properties, equivalence, partial order. Proof techniques, inductive proof. Counting techniques and discrete probability. Logic: propositional calculus, first-order predicate calculus. Formal reasoning: natural deduction, resolution. Applications to program correctness and automatic reasoning. Introduction to algebraic structures in computing.
This related stackoverflow question has some nice answers: What are good starting points for someone interested in natural language processing?
This is a very big field. The prerequisites mostly consist of probability/statistics, linear algebra, and basic computer science, although Natural Language Processing requires a more intensive computer science background to start with (frequently covering some basic AI). Regarding specific langauges: Lisp was created "as an afterthought" for doing AI research, while Prolog (with it's roots in formal logic) is especially aimed at Natural Language Processing, and many courses will use Prolog, Scheme, Matlab, R, or another functional language (e.g. OCaml is used for this course at Cornell) as they are very suited to this kind of analysis.
Here are some more specific pointers:
For Machine Learning, Stanford CS 229: Machine Learning is great: it includes everything, including full videos of the lectures (also up on iTunes), course notes, problem sets, etc., and it was very well taught by Andrew Ng.
Note the prerequisites:
Students are expected to have the following background: Knowledge of
basic computer science principles and skills, at a level sufficient to write
a reasonably non-trivial computer program. Familiarity with the basic probability theory.
Familiarity with the basic linear algebra.
The course uses Matlab and/or Octave. It also recommends the following readings (although the course notes themselves are very complete):
Christopher Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001.
Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
Richard Sutton and Andrew Barto, Reinforcement Learning: An introduction. MIT Press, 1998
For Natural Language Processing, the NLP group at Stanford provides many good resources. The introductory course Stanford CS 224: Natural Language Processing includes all the lectures online and has the following prerequisites:
Adequate experience with programming
and formal structures. Programming
projects will be written in Java 1.5,
so knowledge of Java (or a willingness
to learn on your own) is required.
Knowledge of standard concepts in
artificial intelligence and/or
computational linguistics. Basic
familiarity with logic, vector spaces,
and probability.
Some recommended texts are:
Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Second Edition. Prentice Hall.
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in Prolog. Addison-Wesley. (this is available online for free)
Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.
The prerequisite computational linguistics course requires basic computer programming and data structures knowledge, and uses the same text books. The required articificial intelligence course is also available online along with all the lecture notes and uses:
S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Second Edition
This is the standard Artificial Intelligence text and is also worth reading.
I use R for machine learning myself and really recommend it. For this, I would suggest looking at The Elements of Statistical Learning, for which the full text is available online for free. You may want to refer to the Machine Learning and Natural Language Processing views on CRAN for specific functionality.
My recommendation would be either or all (depending on his amount and area of interest) of these:
The Oxford Handbook of Computational Linguistics:
(source: oup.com)
Foundations of Statistical Natural Language Processing:
Introduction to Information Retrieval:
String algorithms, including suffix trees. Calculus and linear algebra. Varying varieties of statistics. Artificial intelligence optimization algorithms. Data clustering techniques... and a million other things. This is a very active field right now, depending on what you intend to do.
It doesn't really matter what language you choose to operate in. Python, for instance has the NLTK, which is a pretty nice free package for tinkering with computational linguistics.
I would say probabily & statistics is the most important prerequisite. Especially Gaussian Mixture Models (GMMs) and Hidden Markov Models (HMMs) are very important both in machine learning and natural language processing (of course these subjects may be part of the course if it is introductory).
Then, I would say basic CS knowledge is also helpful, for example Algorithms, Formal Languages and basic Complexity theory.
Stanford CS 224: Natural Language Processing course that was mentioned already includes also videos online (in addition to other course materials). The videos aren't linked to on the course website, so many people may not notice them.
Jurafsky and Martin's Speech and Language Processing http://www.amazon.com/Speech-Language-Processing-Daniel-Jurafsky/dp/0131873210/ is very good. Unfortunately the draft second edition chapters are no longer free online now that it's been published :(
Also, if you're a decent programmer it's never too early to toy around with NLP programs. NLTK comes to mind (Python). It has a book you can read free online that was published (by OReilly I think).
How about Markdown and an Introduction to Parsing Expression Grammars (PEG) posted by cletus on his site cforcoding?
ANTLR seems like a good place to start for natural language processing. I'm no expert though.
Broad question, but I certainly think that a knowledge of finite state automata and hidden Markov models would be useful. That requires knowledge of statistical learning, Bayesian parameter estimation, and entropy.
Latent semantic indexing is a commonly yet recently used tool in many machine learning problems. Some of the methods are rather easy to understand. There are a bunch of potential basic projects.
Find co-occurrences in text corpora for document/paragraph/sentence clustering.
Classify the mood of a text corpus.
Automatically annotate or summarize a document.
Find relationships among separate documents to automatically generate a "graph" among the documents.
EDIT: Nonnegative matrix factorization (NMF) is a tool that has grown considerably in popularity due to its simplicity and effectiveness. It's easy to understand. I currently research the use of NMF for music information retrieval; NMF has shown to be useful for latent semantic indexing of text corpora, as well. Here is one paper. PDF
Prolog will only help them academically it is also limited for logic constraints and semantic NLP based work. Prolog is not yet an industry friendly language so not yet practical in real-world. And, matlab also is an academic based tool unless they are doing a lot of scientific or quants based work they wouldn't really have much need for it. To start of they might want to pick up the 'Norvig' book and enter the world of AI get a grounding in all the areas. Understand some basic probability, statistics, databases, os, datastructures, and most likely an understanding and experience with a programming language. They need to be able to prove to themselves why AI techniques work and where they don't. Then look to specific areas like machine learning and NLP in further detail. In fact, the norvig book sources references after every chapter so they already have a lot of further reading available. There are a lot of reference material available for them over internet, books, journal papers for guidance. Don't just read the book try to build tools in a programming language then extrapolate 'meaningful' results. Did the learning algorithm actually learn as expected, if it didn't why was this the case, how could it be fixed.

How To: Pattern Recognition

I'm interested in learning more about pattern recognition. I know that's somewhat of a broad field, so I'll list some specific types of problems I would like to learn to deal with:
Finding patterns in a seemingly random set of bytes.
Recognizing known shapes (such as circles and squares) in images.
Noticing movement patterns given a stream of positions (Vector3)
This is a new area of experimentation for me personally, and to be honest, I simply don't know where to start :-) I'm obviously not looking for the answers to be provided to me on a silver platter, but some search terms and/or online resources where I can start to acquaint myself with the concepts of the above problem domains would be awesome.
Thanks!
ps: For extra credit, if said resources provide code examples/discussion in C# would be grand :-) but doesn't need to be
Hidden Markov Models are a great place to look, as well as Artificial Neural Networks.
Edit: You could take a look at NeuronDotNet, it's open source and you could poke around the code.
Edit 2: You can also take a look at ITK, it's also open source and implements a lot of these types of algorithms.
Edit 3: Here's a pretty good intro to neural nets. It covers a lot of the basics and includes source code (albeit in C++). He implemented an unsupervised learning algorithm, I think you may be looking for a supervised backpropagation algorithm to train your network.
Edit 4: Another good intro, avoids really heavy math, but provides references to a lot of that detail at the bottom, if you want to dig into it. Includes pseudo-code, good diagrams, and a lengthy description of backpropagation.
This is kind of like saying "I'd like to learn more about electronics.. anyone tell me where to start?" Pattern Recognition is a whole field - there are hundreds, if not thousands of books out there, and any university has at least several (probably 10 or more) courses at the grad level on this. There are numerous journals dedicated to this as well, that have been publishing for decades ... conferences ..
You might start with the wikipedia.
http://en.wikipedia.org/wiki/Pattern_recognition
This is kind of an old question, but it's relevant so I figured I'd post it here :-) Stanford began offering an online Machine Learning class here - http://www.ml-class.org
OpenCV has some functions for pattern recognition in images.
You might want to look at this :http://opencv.willowgarage.com/documentation/pattern_recognition.html. (broken link: closest thing in the new doc is http://opencv.willowgarage.com/documentation/cpp/ml__machine_learning.html, although it is no longer what I'd call helpful documentation for a beginner - see other answers)
However, I also recommend starting with Matlab because openCV is not intuitive to use.
Lot of useful links on this page on computer vision related pattern recognition. Some of the links seem to be broken now but you may find it useful.
I am not an expert on this, but reading about Hidden Markov Models is a good way to start.
Beware false patterns! For any decently large data set you will find subsets that appear to have pattern, even if it is a data set of coin flips. No good process for pattern recognition should be without statistical techniques to assess confidence that the detected patterns are real. When possible, run your algorithms on random data to see what patterns they detect. These experiments will give you a baseline for the strength of a pattern that can be found in random (a.k.a "null") data. This kind of technique can help you assess the "false discovery rate" for your findings.
learning pattern-recoginition is easier in matlab..
there are several examples and there are functions to use.
it is good for the understanding concepts and experiments...
I would recommend starting with some MATLAB toolbox. MATLAB is an especially convenient place to start playing around with stuff like this due to its interactive console. A nice toolbox I personally used and really liked is PRTools (http://prtools.org); they have an implementation of pretty much every pattern recognition tool and also some other machine learning tools (Neural Networks, etc.). But the nice thing about MATLAB is that there are many other toolboxes as well you can try out (there is even a proprietary toolbox from Mathworks)
Whenever you feel comfortable enough with the different tools (and found out which classifier is perfomring best for you problem), you can start thinking about implementing the machine learning in a different application.

What is a programming language which is appropriate with data classification project

I would like to easily implement a data classification project, so I'm looking for the language which provides the library for that. Could you suggest the proper language?
matlab is not exactly a programming language, but no doubt it's the easiest way to implementing math oriented programs. it has lots of toolboxes for classifications (e.g. MLP, SVM) optimization toolboxes.
There is a Python distribution called SciPy that has lots of tools for scientific programming and people have used it to do data classification. Some bioinformatics people have built Excel2SVM in Python.
If the focus of your work is on the data classification, not on developing software, then Python is a good choice because you can be more productive than with languages like java or C++.
I'd say you really need more information before choosing a language.
Where are you getting data from, what front end do you want to use (web / dedicated client) ?
C# could do just as good a job, or any Object oriented language.
Cheers
(A little late coming, but I thought this answer should be here for the record).
WEKA and MALLET are two useful libraries for data classification that I've come across. I've used WEKA in a couple of projects and can say that it is pretty mature. Both these libraries are Java-based.

How to get started on Information Extraction?

Could you recommend a training path to start and become very good in Information Extraction. I started reading about it to do one of my hobby project and soon realized that I would have to be good at math (Algebra, Stats, Prob). I have read some of the introductory books on different math topics (and its so much fun). Looking for some guidance. Please help.
Update: Just to answer one of the comment. I am more interested in Text Information Extraction.
Just to answer one of the comment. I am more interested in Text Information Extraction.
Depending on the nature of your project, Natural language processing, and Computational linguistics can both come in handy -they provide tools to measure, and extract features from the textual information, and apply training, scoring, or classification.
Good introductory books include OReilly's Programming Collective Intelligence (chapters on "searching, and ranking", Document filtering, and maybe decision trees).
Suggested projects utilizing this knowledge: POS (part-of-speech) tagging, and named entity recognition (ability to recognize names, places, and dates from the plain text). You can use Wikipedia as a training corpus since most of the target information is already extracted in infoboxes -this might provide you with some limited amount of measurement feedback.
The other big hammer in IE is search, a field not to be underestimated. Again, OReilly's book provides some introduction in basic ranking; once you have a large corpus of indexed text, you can do some really IE tasks with it. Check out Peter Norvig: Theorizing from data as a starting point, and a very good motivator -maybe you could reimplement some of their results as a learning exercise.
As a fore-warning, I think I'm obligated to tell you, that information extraction is hard. The first 80% of any given task is usually trivial; however, the difficulty of each additional percentage for IE tasks are usually growing exponentially -in development, and research time. It's also quite underdocumented -most of the high-quality info is currently in obscure white papers (Google Scholar is your friend) -do check them out once you've got your hand burned a couple of times. But most importantly, do not let these obstacles throw you off -there are certainly big opportunities to make progress in this area.
I would recommend the excellent book Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. It covers a broad area of issues which form a great and up-to-date (2008) basis for Information Extraction and is available online in full text (under the given link).
I would suggest you take a look at the Natural Language Toolkit (nltk) and the NLTK Book. Both are available for free and are great learning tools.
You don't need to be good at math to do IE just understand how the algorithm works, experiment on the cases for which you need an optimal result performance, and the scale with which you need to achieve target accuracy level and work with that. You are basically working with algorithms and programming and aspects of CS/AI/Machine learning theory not writing a PhD paper on building a new machine-learning algorithm where you have to convince someone by way of mathematical principles why the algorithm works so I totally disagree with that notion. There is a difference between practical and theory - as we all know mathematicians are stuck more on theory then the practicability of algorithms to produce workable business solutions. You would, however, need to do some background reading both books in NLP as well as journal papers to find out what people found from their results. IE is a very context-specific domain so you would need to define first in what context you are trying to extract information - How would you define this information? What is your structured model? Supposing you are extracting from semi and unstructured data sets. You would then also want to weigh out whether you want to approach your IE from a standard human approach which involves things like regular expressions and pattern matching or would you want to do it using statistical machine learning approaches like Markov Chains. You can even look at hybrid approaches.
A standard process model you can follow to do your extraction is to adapt a data/text mining approach:
pre-processing - define and standardize your data to extraction from various or specific sources cleansing your data
segmentation/classification/clustering/association - your black box where most of your extraction work will be done
post-processing - cleansing your data back to where you want to store it or represent it as information
Also, you need to understand the difference between what is data and what is information. As you can reuse your discovered information as sources of data to build more information maps/trees/graphs. It is all very contextualized.
standard steps for: input->process->output
If you are using Java/C++ there are loads of frameworks and libraries available you can work with.
Perl would be an excellent language to do your NLP extraction work with if you want to do a lot of standard text extraction.
You may want to represent your data as XML or even as RDF graphs (Semantic Web) and for your defined contextual model you can build up relationship and association graphs that most likely will change as you make more and more extractions requests. Deploy it as a restful service as you want to treat it as a resource for documents. You can even link it to taxonomized data sets and faceted searching say using Solr.
Good sources to read are:
Handbook of Computational Linguistics and Natural Language Processing
Foundations of Statistical Natural Language Processing
Information Extraction Applications in Prospect
An Introduction to Language Processing with Perl and Prolog
Speech and Language Processing (Jurafsky)
Text Mining Application Programming
The Text Mining Handbook
Taming Text
Algorithms of Intelligent Web
Building Search Applications
IEEE Journal
Make sure you do a thorough evaluation before deploying such applications/algorithms into production as they can recursively increase your data storage requirements. You could use AWS/Hadoop for clustering, Mahout for large scale classification amongst others. Store your datasets in MongoDB or unstructured dumps into jackrabbit, etc. Try experimenting with prototypes first. There are various archives you can use to base your training on say Reuters corpus, tipster, TREC, etc. You can even check out alchemy API, GATE, UIMA, OpenNLP, etc.
Building extractions from standard text is easier than say a web document so representation at pre-processing step becomes even more crucial to define what exactly it is you are trying to extract from a standardized document representation.
Standard measures include precision, recall, f1 measure amongst others.
I disagree with the people who recommend reading Programming Collective Intelligence. If you want to do anything of even moderate complexity, you need to be good at applied math and PCI gives you a false sense of confidence. For example, when it talks of SVM, it just says that libSVM is a good way of implementing them.
Now, libSVM is definitely a good package but who cares about packages. What you need to know is why SVM gives the terrific results that it gives and how it is fundamentally different from Bayesian way of thinking ( and how Vapnik is a legend).
IMHO, there is no one solution to it. You should have a good grip on Linear Algebra and probability and Bayesian theory. Bayes, I should add, is as important for this as oxygen for human beings ( its a little exaggerated but you get what I mean, right ?). Also, get a good grip on Machine Learning. Just using other people's work is perfectly fine but the moment you want to know why something was done the way it was, you will have to know something about ML.
Check these two for that :
http://pindancing.blogspot.com/2010/01/learning-about-machine-learniing.html
http://measuringmeasures.com/blog/2010/1/15/learning-about-statistical-learning.html
http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
Okay, now that's three of them :) / Cool
The Wikipedia Information Extraction article is a quick introduction.
At a more academic level, you might want to skim a paper like Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text.
Take a look here if you need enterprise grade NER service. Developing a NER system (and training sets) is a very time consuming and high skilled task.
This is a little off topic, but you might want to read Programming Collective Intelligence from O'Reilly. It deals indirectly with text information extraction, and it doesn't assume much of a math background.

Resources