108.413A: 컴퓨터언어학(Computational Linguistics)/자연어처리(Natural Language Processing)


Hyopil Shin (Dept. of Linguistics, Seoul National University)

hpshin@snu.ac.kr
http://knlp.snu.ac.kr/

Mon/Wed  11:00 to 12:15 in building 7, room 210

T.A: 서진 (Seemdog@snu.ac.kr)

ChatGPTChatGPT

(http://www.theverge.com/2016/3/11/11208078/lee-se-dol-go-google-kasparov-jennings-ai)

Course Description

이 과목에서는 자연어처리(Natural Language Processing) 또는 컴퓨터언어학(Computational Linguistics)의 이론적인 기초에서부터 최근의 Transformer와 이를 활용한 사전학습모델에 기반한 방법론을 학습한다. 언어모델(Language Model) 관점의 여러 방법론이 다루어지며 강의 전반부에서는 N-gram, Entropy, Embedding, Text Classification 등에 관한 내용이 다루어지며, 후반부에는 Sequence-to-Sequence Model, Attention, Transformer가 중점적으로 다루어진다. Transformer와 이를 활용한 BERT, GPT 등 여러 사전학습모델을 학습하고 Huggingface의 Transformers를 활용하는 방법과 이를 분류, 요약, 생성, 질의어응답, 챗봇 등과 같은 다양한 태스크에서 활용할 수 있는 언어처리 능력을 키우도록 한다.

Useful Sites

  • Lectures


Textbook and Sites

What
            is Natural Language Processing

                                                                                     

딥러닝을 위한 자연어처리 입문


huggingface transformers

Huggingface Transformers


DL wizard

Deep Learning Tutorials based on PyTorch

 

Syllabus


Date Topics Related Materials and Resources
PyTorch
1 3/4-3/9

Introduction to Natural Language Processing


Language Modeling 1- Statistical Language Modeling: N-Grams

Natural Language Processing is Fun!

Language Modeling and with N-Grams

PyTorch:
2 3/11-3/16 Language Modeling 1- Statistical Language Modeling: Entropy and Maximum Entropy Models


Entropy is a Measure of Uncertainty
3 3/18-3/23 Text Classification
Text Classification  
4 3/25-3/30 Vector Semantics

Language Modeling II: Static Word Embedding


Vector Semantics and Embeddings PyTorch:

Linear Regression With PyTorch
Logistic Regression With PyTorch
5 4/1-4/6 Language Modeling II: Static Word Embedding

Vector Semantics and Embeddings


PyTorch:

Word Embeddings: Encoding Lexical Semantics

6 14/8-4/13

Sequence to Sequence Model: Encoder-Decoder





PyTorch:
  • pytorch-seq2seq
    • Sequence to Sequence Learning with Neural Networks
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Packed Padded Sequences, Masking, Inference and BLEU
    • Convolutional Sequence to Sequence Learning
    • Attention is All You Need


A Comprehensive Introduction to Torchtext

Torchtext Github

7 4/15-4/20 Attention Model
Neural Machine Translation By Jointly Learning to Align and Translate

Attention: Illustrated Attention PyTorch:
  • pytorch-seq2seq
    • Sequence to Sequence Learning with Neural Networks
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Packed Padded Sequences, Masking, Inference and BLEU
    • Convolutional Sequence to Sequence Learning
    • Attention is All You Need
8 4/22-4/27 Transformer
Self Attention: Attention is All you need

The Illustrated Transformer

 

PyTorch:
The Annotated Transformer

9
4/22-4/27 Language Modeling III: Dynamic Word Embedding : BERT (Bidirectional Encoder Representations from Transformers)

BERT Fine Tuning
BERT Fine-Tuning Tutorial with PyTorch

BERT Word Embeddings


10



4/29-5/4

Pre-trained Models and Transfer Learning




XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

XLNet: Generalized Autoregressive Pretraining for Language Understanding

MASS: Masked Sequence to Sequence Pre-training for Language Generation

BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

GLM: All NLP Tasks Are Generation Tasks: A General Pretraining Framework

SpanBERT: Improving Pre-training by Representing and Predicting Spans
11 5/6-5/11

Transformers by Huggingface:

Quick Tour
Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Sumarization, and Translation



12 5/13-5/18 Transformers by Huggingface:

Quick Tour
Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Sumarization, and Translation

Various Korean text processing with Huggingface Transformers and Korean Pre-trained models
13 5/20-5/25 Language Modeling IV:Large Language Models (LLMs)

  • Background for LLMs
  • Technical Evolution of GPT-series Models
  • Resources of LLMs
  • Pre-Training
  • Adaptation of LLMs: Instruction Tuning/Alignment Tuning
  • Utilization: In-Context Learning/Chain-of-Thought Prompting
  • Capacity Evaluation
  • Practical GuideBook of Prompt Design
  • Applications

 





14 5/27-6/1 Large Language Models For Korean
DaG(David and Goliath Large Language Model)
15 6-3/6-8 Final Test and Project Presentations