Applications of Natural Language Processing: Transformers-based Methodologies

M3239.001100: 자연어처리의 응용: 트랜스포머기반 방법론들
(Applications of Natural Language Processing: Transformers-based Methodologies)

108.535A: 컴퓨터언어학연구 II: 트랜스포머기반 방법론들

(Studies on Computational Linguistics II: Transformers-based Methodologies)

Hyopil Shin (Graduate School of Data Science and Dept. of Linguistics, Seoul National University)

hpshin@snu.ac.kr
https://sites.google.com/snu.ac.kr/gsds-nlp/home
http://knlp.snu.ac.kr/

Tue/Thur 3:30 to 4:45 in building 942 room 302

T.A: 이상아(visualjan@snu.ac.kr)

( Photo by Arseny Togulev on Unsplash)

Course Description

현재 자연언어처리분야에서 Game Changer가 된 Transformer를 중심으로 이를 활용한 여러 응용분야들을 살펴보도록 한다. Transformer의 이론적 고찰에서부터 시작하여 Huggingface의 Transformers에서 제공하는 architecture들을 살펴보고 이 중 중요한 모델들에 대해서 집중적으로 학습한다. 이를 바탕으로 Transformer를 활용한 Sentence Bert, Question Answering, Search, Chatbot, Multimodal, Text Classification/Summarization 등을 살펴보도록 한다. 수강생들은 강의에서 제공되는 주제들을 선택하여 관련 페이퍼와 자료들을 공부하여 발표하고 최종적으로 이를 활용한 시스템의 구현이나 학회에 발표할 수 있는 논문을 작성할 수 있도록 한다. 이 강의를 수강하기 위해서는 텍스트 및 자연어 빅데이터 분석방법론/컴퓨터언어학연구 I 등을 수강하였거나 관련 내용을 숙지하고 있어야 한다. Python, Pytorch 등이 기본적으로 요구된다. 이 과목은 데이터사이언스의 자연어처리의 응용 과목과 언어학과의 컴퓨터언어학연구 II의 Cross-listing 과목이다.

Updates

강의는 기본적으로 Zoom을 이용한 온라인 강의. Zoom강의 주소는 학기초 ETL을 통해 공지됨
강의의 실제 자료와 주피터 노트북은 ETL에 탑재됨

Useful Sites

Lectures

PyTorch

Other Resources

Jupyter notebook

Google Colabatory

Interfaces for ML Models

Textbook and Sites

Huggingface Transformers

Syllabus

	Date	Topics	Related Materials and Resources	Repositories
1	3/2 & 3/4	Introduction to Class Natural Language Processing: the Age of Transformers NLP for Supervised Learning- A Brief Survey Encoder-Decoder Review Attention Model Introduction to Attention Mechanism: Bahdanau and Luong Attention	Transformer-based Encoder-Decoder Models Attention: Illustrated Attention	PyTorch: pytorch-seq2seq Sequence to Sequence Learning with Neural Networks Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Neural Machine Translation by Jointly Learning to Align and Translate Packed Padded Sequences, Masking, Inference and BLEU Convolutional Sequence to Sequence Learning Attention is All You Need
2	3/9 & 3/11	Introduction to Transformer Self Attention: Attention is All you need The Illustrated Transformer BERT (Bidirectional Encoder Representations from Transformers) The Illustrated BERT, ELMo, and co. (How NLP cracked Transfer Learning) FROM Pre-trained Word Embeddings TO Pre-trained Language Models - Focused on BERT	BERT Fine Tuning BERT Fine-Tuning Tutorial with PyTorch BERT Word Embeddings Transformers Explained Visually(Part 1): Overview of Functionality Transformers Explained Visually(Part 2): How it works, step-by-step Transformers Explained Visually(Part3): Multi-head Attention, deep dive Master Positional Encoding: Part I Rethinking Attention with Performers From Transformers to Performers: Approximating Attention SWITCH TRANSFORMERS: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Google Switch Transformers: Scaling to Trillion Parameter Models with Constant Computational Costs	PyTorch: The Annotated Transformer
3	3/16 & 3/18	Introduction to Huggingface Transformers Quick Tour Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Summarization, and Translation Some Models for Long Sequences Big Bird: Transformers for Longer Sequences Understanding Google's BigBird - Is It Another Big Milestone In NLP? REFORMER: The Efficient Transformer Github of patrickvonplaten for Reformer A Deep Dive into the Reformer Illustrating the Reformer Longformer: The Long-Document Transformer Longformer: The Long-Document Transformer	Transformers by Huggingface and Full Documentation	ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. BARThez (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong and Richard Socher. DeBERTa (from Microsoft Research) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT. DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei* and Ilya Sutskever*. LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou. Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan. LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal. MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team. MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into DistilmBERT and a German version of DistilBERT. SqueezeBert released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau. XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. Want to contribute a new model? We have added a detailed guide and templates to guide you in the process of adding a new model. You can find them in the templates folder of the repository. Be sure to check the contributing guidelines and contact the maintainers or open an issue to collect feedbacks before starting your PR.
4	3/23 & 3/25	Introduction to Huggingface Transformers FARM: Framework for Adapting Representation Models	Huggingface Transformers Notebooks Fine Tuning BERT for Text Classification with FARM
5	3/30 & 4/1	Sentence Embedding with Transformers Sentence Embedding: Literature Review Richer Sentence Embeddings Using Sentence-BERT-Part I	Sentence-BERT: Sentence Embeddings using Siamese-Networks Advance BERT model via transferring from Cross-Encoders to Bi-Encodes - Data Augmentation Method to Improve SBERT Bi-Encoders for Pairwise Sentence Scoring Tasks (semantic Sentence Tasks) A Complete Guide to Transfer Learning From English to Other Languages Using Sentence Embeddings BERT Models	GitHub - adsieg/text_similarity: Text Similarity
6	4/6 & 4/8	Sentence Embedding with Transformers	Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation LaBSE:Language-Agnostic BERT Sentence Embeddings by Google AI Billion-scale Semantic Similarity Search with FAISS+SBERT How to Build Semantic Search with Transformers and FAISS	Facebook Faiss : Library for efficient similarity search and clustering of dense vectors.
7	4/13 & 4/15	Search with Transformers	Semantic Search with Embeddings: index anythings Introducing txtai, an AI-Powered search engine on Transformers Building a Faster and Accurate Search Engine on Custom Dataset with Transformers Deep Learning for Semantic Text Matching	txtai tldrstroy
8	4/20 & 4/22	Search with Transformers	Introducing txtai, an AI-Powered search engine on Transformers Building a Faster and Accurate Search Engine on Custom Dataset with Transformers Deep Learning for Semantic Text Matching	txtai tldrstroy
9	4/27 & 4/29	Text Classification /Generation with Transformers	Siamese and Dual BERT for Multi Text Classification GPT2 for Text Classification using Huggingface Transformers
10	5/4 & 5/6	Text Classification /Generation with Transformers	Build a Bidirectional Text Generation Using Pytorch Text Generation in Any Language with GPT-2
11	5/11 & 5/13	Summarization with Transformers	TLDR!! Summarize Articles and Content With NLP PEGASUS: Google's State of the Art Abstractive Summarization Model Fine Tuning a T5 Transformer for Any Summarization Task Summarize Reddit Comments using T5, BART, GPT-2, XLNet Models DiscoBERT: A BERT that Shortens Your Reading Time	PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Zhang et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. Language Models are Unsupervised Multitask Learners by Radford et al. Discourse-Aware Neural Extractive Text Summarization
12	5/18 & 5/20	Multimodal Transformers TAPAS	Transformers with Tabular Data: How to Incorporate Tabular Data with Huggingface Transformers Google Unveils TAPAS, a BERT-based Neural Network for Querying Tables Using Natural Language Google TAPAS is a BERT-based Model to Query Tabular Data Using Neural Language	Multimodal Transformers \| Transformers with Tabular Data Weakly Supervised Table Parsing via Pre-training by Herzig et al.
13	5/25 & 5/27	QA with Transformers	BERT-based Cross-Lingual Question Answering with DeepPavlov How to Finetune mT5 to Create a Question Generator(for 100_Languages) Build an Open-Domain Question-Answering System With BERT in 3 Lines of Code Sentence2MCQ using BERT Word Sense Disambiguation and T5 Transformer	Haystack: Neural Question Answering at Scale
14	6/1 & 6/3	Chatbot with Transformers	Chatbots Were the Next Big Thing: What Happened? - The Startup Chatbots are Cool! A Framework Using Python Let's Build an Intelligent Chatbot Make Your Own Rick Sanchez (bot) with Transformers and DialoGPT Fine-Tuining Blenderbot- Part 1: The Data Blenderbot - part 2: The Transformer	Recipes for Building an Open-domain Chatbot by Roller et al.
15	6/8 & 6/10	Final Presentations

M3239.001100: 자연어처리의 응용: 트랜스포머기반 방법론들 (Applications of Natural Language Processing: Transformers-based Methodologies)