Hyopil Shin (Dept. of Linguistics, Seoul National University)
hpshin@snu.ac.kr
http://knlp.snu.ac.kr/
Wed 2:00 to
4:45 in building 3 room 103
T.A: 조혜미(huimei6361@snu.ac.kr)
ChatGPT
(http://www.theverge.com/2016/3/11/11208078/lee-se-dol-go-google-kasparov-jennings-ai)
이 과목에서는
자연언어처리(Natural Language Processing) 또는
컴퓨터언어학(Computational Linguistics)의 이론적인 기초에서부터 최근의
Transformers, BERT, chatGPT 기반의 방법론을 학습한다. 강의 전반부에서는 N-gram,
Entropy, Embedding에 관한 내용이 다루어지며 후반부에는
Encoder-Decoder, Attention, Transformer를 학습하고
Huggingface의 Transformers의 사전학습모델과 모듈을 사용하여 자연언어처리에 활용하는
다양한 태스크를 실제 구현해 보도록 한다. 프로그래밍으로 Pytorch가 다루어지며 모든 과제는 토치를
기반으로 구현하도록 한다. 파이선 및 딥러닝 기본 지식이 요구된다. 이 수업을 통해 자연언어처리의
기본개념에서부터 최근의 방법론까지 학습하여 실제 언어처리에 활용할 수 있는 능력을 키우도록 한다.
Date | Topics | Related Materials and
Resources |
PyTorch |
|
1 | 9/4 |
Introduction to Natural Language Processing
Language Modeling 1- Statistical Language
Modeling: N-Grams |
Natural
Language Processing is Fun! Language Modeling and with N-Grams |
PyTorch: |
2 | 9/11 | Language Modeling
1- Statistical Language Modeling: Entropy and Maximum
Entropy Models |
Entropy is a Measure of Uncertainty | |
3 | 9/18 | Text Classification |
Text Classification |
|
4 | 9/25 | Vector Semantics Language Modeling II: Static Word Embedding
|
Vector Semantics and Embeddings | PyTorch: Linear Regression With PyTorch Logistic Regression With PyTorch |
5 | 10/2 | Language Modeling II:
Static Word Embedding |
Vector Semantics
and Embeddings |
PyTorch:
|
6 | 10/9 |
Sequence to Sequence
Model: Encoder-Decoder |
PyTorch:
|
|
7 | 10/16 | Attention
Model Neural Machine Translation By Jointly Learning to Align and Translate
|
Attention: Illustrated Attention | PyTorch:
|
8 | 10/23 | Transformer Self Attention: Attention is All you need |
PyTorch: |
|
9 |
10/30 | Language
Modeling III:
Dynamic Word
Embedding : BERT
(Bidirectional Encoder
Representations from
Transformers) |
BERT Fine Tuning BERT Fine-Tuning Tutorial with PyTorch BERT Word Embeddings |
|
10 |
11/6 |
Pre-trained
Models and Transfer Learning
|
Masked
Language
Models
|
XLM-R:
Unsupervised Cross-lingual Representation Learning
at Scale XLNet: Generalized Autoregressive Pretraining for Language Understanding MASS: Masked Sequence to Sequence Pre-training for Language Generation BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension GLM: All NLP Tasks Are Generation Tasks: A General Pretraining Framework SpanBERT: Improving Pre-training by Representing and Predicting Spans |
11 | 11/13 |
Transformers by
Huggingface: |
Introduction to
Huggingface Course |
|
12 | 11/20 | Large
Language
Models (LLMs)
|
Large Language Models | |
13 | 11/27 | Large
Language Models (LLMs)
|
|
|
14 | 12/4 | Large Language Models For Korean |
|
|
15 | 12/11 | Final Test and Project Presentations |