I recently attended a talk by Kevin Clarke (CS224n) where he talked about the future trends in NLP. I am writing this post to summarize and discuss the recent trends. Slide snippets are from his guest lecture.
There are 2 primary topics that lay down the trends for NLP with Deep Learning:
1. Pre-Training using Unsupervised/Unlabeled data
2. OpenAI GPT-2 breakthrough
1. Pre-Training using Unsupervised/Unlabeled data
Supervised data is expensive and limited, how can we use Unsupervised data to supplement training with supervised fine-tuning to do better?
Let's apply this to the problem of Machine Translation and see how this helps -
If you have 2 corpus of text (transcriptions or wikipedia articles) in different languages with no cross language mapping.
We can use this for pre-training, train an encoder and decoder LSTM (without attention) individually on both corpus and fit them together in a model and fine-tune over a labeled dataset.
How does this help? Both the encoder and decoder LSTMs here have learnt the notion of their respective language distributions and are good as generative models for each of their languages. When you put them together (#2) the model learn to use the compressed representation and map them from source to target languages. Pre-training is equivalent to "smart" initialization in some way. Let's take this further.
The issue with pre-training is the network was not originally trained to learn a source to destination mapping. Let's see how Self-Training applied as Back-Translation helps here, we use a NMT (Neural Machine Translation) model to translate from English to French and then feed the output NMT1 as a reverse value pair to NMT2. Meaning, given the output of NMT1, model 2 learns how to generate the input. This is more of "augmented" supervised data where given noisy input of another model, your network is learning to predict the input of the prior model.
So far so good, let's look into a simpler approach to perform NMT with zero labeled data. This is done by learning word vectors in both the languages, the good thing about word vectors is they have inherent structure. This allows us to learn an affine mapping between the two languages a weight (W) in the image below.
Words with similar meanings are going to be mapped pretty close in embedding space between languages. Now using these on our previous encoder and decoder, we can train the network to produce translation between languages since we now know the meaning of words in both the languages.
A summary of the objective functions of the optimization:
- The same encoder is trained twice, once to learn to produce the source language back in a auto encoder fashion, this helps the network further cement the mapping of the source and target language via - Src -> Target -> Src learning
- Secondly, using supervised data to perform back-translation using output of another network.
Recently, the research team at Facebook augmented the language masking training objective of BERT to not only mask words in a language but provide the network inputs in 2 languages (supervised) and mask words in one. Thus, making it learn meaning between languages - this provided a significant performance improvement and the paper is titled Cross-Lingual BERT
2. GPT-2
Next, let's look at OpenAI's GPT-2 model, this is basically a HUGE model (1.5 billion parameters) trained on quality data (high rated links on Reddit).
GPT-2 is trained as a language model but was evaluated on many tasks i.e. evaluated as a Zero Shot Model.
(Zero shot learning means trying to do a task without ever training on it.)
Now the question is, how did we use GPT-2 perform translation even though it was never explicitly trained to perform translation? Answer to the question can be seen in the corpus snippet below - quotes are translated in a bunch of the input data into another language, the model smartly learns between both the languages a mapping which it uses inherently to perform translation.
Models have continued to grow in size and perform better on Vision and Language tasks. How long will the trend continue is a question we look forward to answer in future :)
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteNice Blog If You want Something Related Digital Marketing click Here
ReplyDeleteBest PR Agencies in Lucknow
Leading Brand Consultants in Indore
Top 10 Brand Consulting Firms in Indore
Top Advertising Agencies in Indore
Top Advertising Agencies in Jhansi
Best Advertising Agencies in Indore
Best Advertising Agencies in Jhansi
Best Advertising Agencies in Lucknow
Creative Print Media Designing Company in Indore
Top Advertising Agencies in Lucknow
Digital Marketing Agencies in Lucknow
Digital Marketing Agencies in Indore
Digital Marketing Agencies in Jhansi
Top Social Media Marketing Agency in Indore
Top Social Media Marketing Agency in Jhansi
Best Digital Marketing Agency Indore
Digital Marketing Agency Indore
Best Social Media Marketing Agency Indore
Creative Digital Marketing Agency in Indore
Best SMO Agency indore
Best Digital Marketing Company In Indore
If You want Digital Marketing Service Contact Us = +91-6263844081
Mobile App Development Company
ReplyDeleteMobile App Development services
Enterprise Mobile App Development
Custom Software Application Development
Educational Software Development Companies
Saas Platform Development
Digital Identity Management
Global Software Development Services
i really enjoyed reading this article , it's really effective and interesting article artificial intelligence course
ReplyDeleteIT's very informative blog and useful article thank you for sharing with us , keep posting learn more about Product engineering services | Product engineering solutions.
ReplyDelete