Applications of Natural Language Processing: Transformer-based Pretrained Models and Applications

Course Description

현재 자연언어처리 및 컴퓨터언어학의 주류가 된 Transformer를 중심으로한 사전학습모델과 이를 활용한 응용분야들을 살펴보도록 한다. 최근 자연어처리는 백지상태에서 모델을 학습하기 보다는 Transformer 기반의 사전학습모델을 구축하고 이를 여러 과제들에 적용하는 방법을 중심으로 하고 있다. 이 강의에서는 사전학습의 역사에서부터 시작하여 transfer learning과 self-supervised learning과의 관계를 살펴보고 사전학습모델의 최근 발달을 총체적으로 살펴보도록 한다. 컴퓨팅 파워와 대규모의 데이터에 기반한 이런 발달을 효율적인 아키텍쳐 디자인, 다양한 데이터의 활용, 계산적 효율성의 증대, 그리고 사전학습모델의 해석과 이론적 분석 관점에서 살펴보도록 한다. 이를 바탕으로 트랜스포머 기반의 사전학습 모델의 Sentence Bert, Question Answering, Search, Text Classification/Summarization 등의 응용에 대해 살펴보도록 한다. 수강생들은 강의에서 제공되는 주제들을 선택하여 관련 페이퍼와 자료들을 공부하여 발표하고 최종적으로 이를 활용한 시스템의 구현이나 학회에 발표할 수 있는 논문을 작성할 수 있도록 한다. 이 강의를 수강하기 위해서는 텍스트 및 자연어 빅데이터 분석방법론/컴퓨터언어학연구 I 등을 수강하였거나 관련 내용을 숙지하고 있어야 한다. Python, Pytorch 등이 기본적으로 요구된다. 이 과목은 데이터사이언스의 자연어처리의 응용 과목과 언어학과의 컴퓨터언어학연구 II의 Cross-listing 과목이다.

Updates

수업 논문발표를 위한 구글 스프레드 시트 (First Come, First Served!)

오미크론의 확산으로 인해 강의는 기본적으로 Zoom을 이용한 온라인 강의로 시작하나, 상황 변화에 따라, 대면, 하이브리드 강의로 전환될 수 있음. Zoom강의 주소는 학기초 ETL을 통해 공지됨
강의의 실제 자료와 주피터 노트북은 ETL에 탑재됨

Useful Sites

Lectures

PyTorch

Other Resources

Jupyter notebook

Google Colabatory

Interfaces for ML Models

Textbook and Sites

Huggingface Transformers

Syllabus

	Date	Topics	Related Materials and Resources	Repositories
1	3/1	Introduction to Class Natural Language Processing: the Age of Transformers Pre-trained Models: Past, Present and Future
2	3/8	Encoder-Decoder Review Attention Model Introduction to Attention Mechanism: Bahdanau and Luong Attention	Transformer-based Encoder-Decoder Models Attention: Illustrated Attention	PyTorch: pytorch-seq2seq Sequence to Sequence Learning with Neural Networks Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Neural Machine Translation by Jointly Learning to Align and Translate Packed Padded Sequences, Masking, Inference and BLEU Convolutional Sequence to Sequence Learning Attention is All You Need
3	3/15	Introduction to Transformer I Self Attention: Attention is All you need The Illustrated Transformer Transformers Illustrated!	Transformers Explained Visually(Part 1): Overview of Functionality Transformers Explained Visually(Part 2): How it works, step-by-step Transformers Explained Visually(Part3): Multi-head Attention, deep dive Master Positional Encoding: Part I	PyTorch: The Annotated Transformer Solving the Bottleneck of Transfer Model
4	3/22	BERT (Bidirectional Encoder Representations from Transformers) The Illustrated BERT, ELMo, and co. (How NLP cracked Transfer Learning) FROM Pre-trained Word Embeddings TO Pre-trained Language Models - Focused on BERT	Bert Fine-Tuning BERT Fine-Tuning Tutorial with PyTorch BERT Word Embeddings
5	3/29	Pre-trained Models: Designing Effective Architecture Combining Autoregressive and Autoencoding Modeling Applying Generalized Encoder-Decoder	XLNet: Generalized Autoregressive Pretraining for Language Understanding GLM: All NLP Tasks Are Generation Tasks: A General Pretraining Framework MASS: Masked Sequence to Sequence Pre-training for Language Generation T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension PEGASUS: Pre-training with Extracted Gap sentences for Abstractive Summarization PALM: Pre-training an Autoencoding & Autogressive Language Model for Context-conditioned Generation	XLNet by ratsgo's blog XLNet github GLM github MASS github T5 github BART github SKT KoBART github PEGASUS github PALM github
6	4/5	Pre-trained Models: Designing Effective Architecture Cognitive-Inspired Architectures More Variants of Existing PTMs: Masking Strategy	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context CogQA: Cognitive Graph for Multi-Hop Reading Comprehension at Scale Language Models as Knowledge Bases? REALM: Retrieval-Augmented Language Model Pre-Training SpanBERT: Improving Pre-training by Representing and Predicting Spans ERNIE(1.0): Enhanced Representation through Knowledge Integration ERINIE 2.0: A Continual Pre-training Framework for Language Understanding ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators	Transformer-XL github CogQA github REALM github SpanBERT github ERINIE github ELECTRA github KR-ELECTRA github KoELECTRA github
7	4/12	Pre-trained Models: Utilizing Multi-Source Data Multilingual Pre-Training Multimodal Pre-Training	XLM-R: Unsupervised Cross-lingual Representation Learning at Scale Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks ViLBERT: Pretraing Task-Agnostic Visionlinguistic Representations for Vision-and-Language Tasks Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training Zero-Shot Text-to-Image Generation Learning Transferable Visual Models From Natural Language Supervision	XLM-R github Unicoder github ViLBERT github DALLE-pytorch github CLIP(Contrastive Language-Image Pretraining) github
8	4/19	Pre-trained Models: Utilizing Multi-Source Data Knowledge-Enchanced Pre-Training	KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation ERNIE(1.0): Enhanced Representation through Knowledge Integration ERINIE 2.0: A Continual Pre-training Framework for Language Understanding KnowBERT: Knowledge Enhanced Contextual Word Representations KGLM: Using Knowledge-Graphs for Fact-Aware Language Modeling A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	KEPLER github ERINIE github KnowBERT github KGLM github CommonsenseStoryGen github
9	4/26	Pre-trained Models: Improving Computational Efficiency System-Level Optimization Efficient Pre-Training Model Compression	Mixed Precision Training SwapAdvisor: Push Deep Learning Beyond the GPU Memory Limit via Smart Swapping Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Train No Evil: Selective Masking for Task-Guided Pre-Training Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping GroupBERT: Enhanced Transformer Architecture with Efficient Group Structures ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers TernaryBERT: Distillation-aware Ultra-low Bit BERT	Megatron-LM github DeepSpeed github BERT and GroupBERT Training on IPUs using Tensorflow Getting more Bang for your buck out of pre-trained language models with GroupBERT ALBERT github Compressing BERT github TernaryBERT github
10	5/3	Pre-trained Models: Interpretation and Theoretical Analysis Knowledge of PTMs: Linguistic Knowledge Knowledge of PTMs: World Knowledge Robustness of PTMs Structural Sparsity of PTMs Theoretical Analysis of PTMs	A Structural Probe for Finding Syntax in Word Representations Linguistic Knowledge and Transferability of Contextual Representations What Does BERT Learn about the Structure of Language? Open Sesame: Getting Inside BERT's Linguistic Knowledge Evaluating Commonsense in Pre-Trained Language Models What BERT is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering What Does BERT Look At? An Analysis of BERT's Attention Revealing the Dark Secrets of BERT Why Does Unsupervised Pre-training Help Deep Learning? A Theoretical Analysis of Contrastive Unsupervised Representation Learning
11	5/10	Introduction to Huggingface Transformers Quick Tour Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Summarization, and Translation Introduction to Huggingface Transformers FARM: Framework for Adapting Representation Models Sentence Embedding with Transformers Sentence Embedding: Literature Review Richer Sentence Embeddings Using Sentence-BERT-Part I		GitHub - adsieg/text_similarity: Text Similarity
12	5/17	Sentence Embedding with Transformers	Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation LaBSE:Language-Agnostic BERT Sentence Embeddings by Google AI Billion-scale Semantic Similarity Search with FAISS+SBERT How to Build Semantic Search with Transformers and FAISS	Facebook Faiss : Library for efficient similarity search and clustering of dense vectors.
13	5/24	Search with Transformers Text Classification /Generation with Transformers	Semantic Search with Embeddings: index anythings Introducing txtai, an AI-Powered search engine on Transformers Building a Faster and Accurate Search Engine on Custom Dataset with Transformers Deep Learning for Semantic Text Matching Siamese and Dual BERT for Multi Text Classification GPT2 for Text Classification using Huggingface Transformers Build a Bidirectional Text Generation Using Pytorch Text Generation in Any Language with GPT-2	txtai tldrstroy
14	5/31	Summarization with Transformers Multimodal Transformers TAPAS	TLDR!! Summarize Articles and Content With NLP PEGASUS: Google's State of the Art Abstractive Summarization Model Fine Tuning a T5 Transformer for Any Summarization Task Summarize Reddit Comments using T5, BART, GPT-2, XLNet Models DiscoBERT: A BERT that Shortens Your Reading Time Transformers with Tabular Data: How to Incorporate Tabular Data with Huggingface Transformers Google Unveils TAPAS, a BERT-based Neural Network for Querying Tables Using Natural Language Google TAPAS is a BERT-based Model to Query Tabular Data Using Neural Language	PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Zhang et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. Language Models are Unsupervised Multitask Learners by Radford et al. Discourse-Aware Neural Extractive Text Summarization Multimodal Transformers \| Transformers with Tabular Data Weakly Supervised Table Parsing via Pre-training by Herzig et al.
15	6/7	QA with Transformers Final Presentations	BERT-based Cross-Lingual Question Answering with DeepPavlov How to Finetune mT5 to Create a Question Generator(for 100_Languages) Build an Open-Domain Question-Answering System With BERT in 3 Lines of Code Sentence2MCQ using BERT Word Sense Disambiguation and T5 Transformer	Haystack: Neural Question Answering at Scale

108.535A: 컴퓨터언어학연구 II: 트랜스포머기반의 사전학습모델과 응용

(Studies on Computational Linguistics II: Transformers-based Pre-Trained Models and Applications)

M3239.004000: 자연어처리의 응용: 트랜스포머기반의 사전학습모델과 응용

(Applications of NLP: Transformers-based Pre-Trained Models and Applications)

Course Description

Updates

Useful Sites

Textbook and Sites

Syllabus