108.535A: 컴퓨터언어학연구 II: 트랜스포머기반의 사전학습모델과 응용

(Studies on Computational Linguistics II: Transformers-based Pre-Trained Models and Applications)


M3239.004000: 자연어처리의 응용: 트랜스포머기반의 사전학습모델과 응용

(Applications of NLP: Transformers-based Pre-Trained Models and Applications)



Hyopil Shin (Dept. of Linguistics, Seoul National University)

hpshin@snu.ac.kr
http://knlp.snu.ac.kr/

Tue 4:00 to 7:00 in building 14, room 203

T.A: 김은진(jyej3154@snu.ac.kr)

        transformer1  Transformer Architectures Huggingface logo  transformer

        ( Photo by Arseny Togulev on Unsplash)

Course Description

현재 자연언어처리 및 컴퓨터언어학의 주류가 된 Transformer를 중심으로한 사전학습모델과 이를 활용한 응용분야들을 살펴보도록 한다. 최근 자연어처리는 백지상태에서 모델을 학습하기 보다는 Transformer 기반의 사전학습모델을 구축하고 이를 여러 과제들에 적용하는 방법을 중심으로 하고 있다. 이 강의에서는 사전학습의 역사에서부터 시작하여 transfer learning과 self-supervised learning과의 관계를 살펴보고 사전학습모델의 최근 발달을 총체적으로 살펴보도록 한다. 컴퓨팅 파워와 대규모의 데이터에 기반한 이런 발달을  효율적인 아키텍쳐 디자인, 다양한 데이터의 활용, 계산적 효율성의 증대, 그리고 사전학습모델의 해석과 이론적 분석 관점에서 살펴보도록 한다. 이를 바탕으로 트랜스포머 기반의 사전학습 모델의 Sentence Bert, Question Answering, Search, Text Classification/Summarization 등의 응용에 대해 살펴보도록 한다. 수강생들은 강의에서 제공되는 주제들을 선택하여 관련 페이퍼와 자료들을 공부하여 발표하고 최종적으로 이를 활용한 시스템의 구현이나 학회에 발표할 수 있는 논문을 작성할 수 있도록 한다. 이 강의를 수강하기 위해서는 텍스트 및 자연어 빅데이터 분석방법론/컴퓨터언어학연구 I 등을 수강하였거나 관련 내용을 숙지하고 있어야 한다. Python, Pytorch 등이 기본적으로 요구된다. 이 과목은 데이터사이언스의 자연어처리의 응용 과목과 언어학과의 컴퓨터언어학연구 II의 Cross-listing 과목이다.

Updates


  • 오미크론의 확산으로 인해 강의는 기본적으로 Zoom을 이용한 온라인 강의로 시작하나, 상황 변화에 따라, 대면, 하이브리드 강의로 전환될 수 있음. Zoom강의 주소는 학기초 ETL을 통해 공지됨
  • 강의의 실제 자료와 주피터 노트북은 ETL에 탑재됨

Useful Sites

  • Lectures



Textbook and Sites



Huggingface Site Huggingface Transformers



Syllabus


Date Topics Related Materials and Resources
Repositories
1 3/1
Introduction to Class


2 3/8
Encoder-Decoder Review

Attention Model


Transformer-based Encoder-Decoder Models

Attention: Illustrated Attention





PyTorch:
  • pytorch-seq2seq
    • Sequence to Sequence Learning with Neural Networks
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Packed Padded Sequences, Masking, Inference and BLEU
    • Convolutional Sequence to Sequence Learning
    • Attention is All You Need
3 3/15
Introduction to Transformer I


Transformers Explained Visually(Part 1): Overview of Functionality

Transformers Explained Visually(Part 2): How it works, step-by-step

Transformers Explained Visually(Part3): Multi-head Attention, deep dive

Master Positional Encoding: Part I


4 3/22
BERT (Bidirectional Encoder Representations from Transformers)



Bert Fine-Tuning
BERT Fine-Tuning Tutorial with PyTorch

BERT Word Embeddings


5 3/29 Pre-trained Models: Designing Effective Architecture
  • Combining Autoregressive and Autoencoding Modeling





  • Applying Generalized Encoder-Decoder


XLNet: Generalized Autoregressive Pretraining for Language Understanding

GLM: All NLP Tasks Are Generation Tasks: A General Pretraining Framework

MASS: Masked Sequence to Sequence Pre-training for Language Generation

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

PEGASUS: Pre-training with Extracted Gap sentences for Abstractive Summarization

PALM: Pre-training an Autoencoding & Autogressive Language Model for Context-conditioned Generation













6
4/5 Pre-trained Models: Designing Effective Architecture
  • Cognitive-Inspired Architectures









  • More Variants of Existing PTMs: Masking Strategy
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

CogQA: Cognitive Graph for Multi-Hop Reading Comprehension at Scale

Language Models as Knowledge Bases?

REALM: Retrieval-Augmented Language Model Pre-Training


SpanBERT: Improving Pre-training by Representing and Predicting Spans

ERNIE(1.0): Enhanced Representation through Knowledge Integration

ERINIE 2.0: A Continual Pre-training Framework for Language Understanding

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
















7
4/12 Pre-trained Models: Utilizing Multi-Source Data
  • Multilingual Pre-Training





  • Multimodal Pre-Training


XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

ViLBERT: Pretraing Task-Agnostic Visionlinguistic Representations for Vision-and-Language Tasks

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

Zero-Shot Text-to-Image Generation

Learning Transferable Visual Models From Natural Language Supervision











8
4/19 Pre-trained Models: Utilizing Multi-Source Data
  • Knowledge-Enchanced Pre-Training




KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

ERNIE(1.0): Enhanced Representation through Knowledge Integration

ERINIE 2.0: A Continual Pre-training Framework for Language Understanding

KnowBERT: Knowledge Enhanced Contextual Word Representations

KGLM: Using Knowledge-Graphs for Fact-Aware Language Modeling

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks












9
4/26 Pre-trained Models: Improving Computational Efficiency
  • System-Level Optimization







  • Efficient Pre-Training






  • Model Compression




Mixed Precision Training

SwapAdvisor: Push Deep Learning Beyond the GPU Memory Limit via Smart Swapping

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Train No Evil: Selective Masking for Task-Guided Pre-Training

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

GroupBERT: Enhanced Transformer Architecture with Efficient Group Structures





ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

TernaryBERT: Distillation-aware Ultra-low Bit BERT































10 5/3 Pre-trained Models: Interpretation and Theoretical Analysis
  • Knowledge of PTMs: Linguistic Knowledge











  • Knowledge of PTMs: World Knowledge





  • Robustness of PTMs







  • Structural Sparsity of PTMs





  • Theoretical Analysis of PTMs


A Structural Probe for Finding Syntax in Word Representations

Linguistic Knowledge and Transferability of Contextual Representations

What Does BERT Learn about the Structure of Language?

Open Sesame: Getting Inside BERT's Linguistic Knowledge



Evaluating Commonsense in Pre-Trained Language Models

What BERT is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering

What Does BERT Look At? An Analysis of BERT's Attention

Revealing the Dark Secrets of BERT


Why Does Unsupervised Pre-training Help Deep Learning?

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

11 5/10
Introduction to Huggingface Transformers
  • Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Summarization, and Translation

Introduction to Huggingface Transformers


Sentence Embedding with Transformers




GitHub - adsieg/text_similarity: Text Similarity


12 5/17 Sentence Embedding with Transformers Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

LaBSE:Language-Agnostic BERT Sentence Embeddings by Google AI

Billion-scale Semantic Similarity Search with FAISS+SBERT

How to Build Semantic Search with Transformers and FAISS
Facebook Faiss : Library for efficient similarity search and clustering of dense vectors.
13 5/24 Search with Transformers












Text Classification /Generation with Transformers
txtai
tldrstroy
14 5/31 Summarization with Transformers
















Multimodal Transformers

TAPAS

TLDR!! Summarize Articles and Content With NLP

PEGASUS: Google's State of the Art Abstractive Summarization Model

Fine Tuning a T5 Transformer for Any Summarization Task

Summarize Reddit Comments using T5, BART, GPT-2, XLNet Models

DiscoBERT: A BERT that Shortens Your Reading Time

Transformers with Tabular Data: How to Incorporate Tabular Data with Huggingface Transformers

Google Unveils TAPAS, a BERT-based Neural Network for Querying Tables Using Natural Language

Google TAPAS is a BERT-based Model to Query Tabular Data Using Neural Language

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Zhang et al.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al.

Language Models are Unsupervised Multitask Learners by Radford et al.

Discourse-Aware Neural Extractive Text Summarization 



Multimodal Transformers | Transformers with Tabular Data


Weakly Supervised Table Parsing via Pre-training by Herzig et al.

15 6/7 QA with Transformers













Final Presentations
BERT-based Cross-Lingual Question Answering with DeepPavlov

How to Finetune mT5 to Create a Question Generator(for 100_Languages)

Build an Open-Domain Question-Answering System With BERT in 3 Lines of Code

Sentence2MCQ using BERT Word Sense Disambiguation and T5 Transformer
Haystack: Neural Question Answering at Scale