M3239.001100: 텍스트 및 자연어 빅데이터 분석방법론 

108.535A: 컴퓨터언어학연구 I


Hyopil Shin (Graduate School of Data Science and Dept. of Linguistics, Seoul National University)

hpshin@snu.ac.kr
https://sites.google.com/snu.ac.kr/gsds-nlp/home
http://knlp.snu.ac.kr/

Tue/Thur  3:30 to 4:45 in building 942 room 302

T.A: 김석기 (blaqdraq77@snu.ac.kr)

transformer

(http://www.theverge.com/2016/3/11/11208078/lee-se-dol-go-google-kasparov-jennings-ai)

Course Description

이 과목에서는 자연언어처리(Natural Language Processing) 또는 컴퓨터언어학(Computational Linguistics)의 이론적인 기초에서부터 최근의 Transformers, BERT 기반의 방법론을 학습한다.  강의 전반부에서는 정규표현, N-gram, Entropy, Embedding에 관한 내용이 다루어지며 후반부에는 Regression과 딥러닝, Encoder-Decoder, Attention 개념들의 리뷰를 하고 Huggingface의 Transformers의 사전학습모델과 모듈을 사용하여 자연언어처리에 활용하는 다양한 태스크를 실제 구현해 보도록 한다. 프로그래밍으로 Pytorch가 다루어지며 모든 과제는 토치를 기반으로 구현하도록 한다. 파이선 및 딥러닝 기본 지식이 요구된다. 이 수업을 통해 자연언어처리의 기본개념에서부터 최근의 방법론까지 학습하여 실제 언어처리에 활용할 수 있는 능력을 키우도록 한다.

Updates

  • 강의는 줌을 기반으로 한 실시간 온라인 강의. 줌 url은 학기초 etl을 통해 공지됨
  • Please set up python, pytorch, and colab for class!

Useful Sites

  • Lectures


Textbook and Sites

speech and Language Processing 3rd
            Edition Drafts

                                                                                     

Speech and Language Processing (3rd ed. Draft)


huggingface transformers

Huggingface Transformers


DL wizard

Deep Learning Tutorials based on PyTorch

Syllabus


Date Topics Related Materials and Resources
PyTorch
1 9/2 & 9/7

Introduction to Natural Language Processing


Regular Expressions, Text Normalization and Edit Distance

Natural Language Processing is Fun!

Regular Expressions, Text Normalization and Edit Distance

PyTorch:
2 9/9 & 9/14 Regular Expressions, Text Normalization and Edit Distance

Language Modeling and with N-Grams
Language Modeling and with N-Grams
3 9/16 & 9/21 Language Modeling and with N-Grams

Entropy and Maximum Entropy Models
Entropy is a Measure of Uncertainty
 
4 9/23 & 9/28 Text Classification



Text Classification

PyTorch:

Linear Regression With PyTorch
Logistic Regression With PyTorch
5 9/30 Vector Semantics and Embeddings
Vector Semantics and Embeddings


PyTorch:

Word Embeddings: Encoding Lexical Semantics

6 10/5 & 10/7
Vector Semantics and Embeddings Vector Semantics and Embeddings


PyTorch:

Sentiment Analysis (IMDB)
7 10/12 & 10/14

Sequence to Sequence Model: Encoder-Decoder



Mid-Term Test


PyTorch:
  • pytorch-seq2seq
    • Sequence to Sequence Learning with Neural Networks
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Packed Padded Sequences, Masking, Inference and BLEU
    • Convolutional Sequence to Sequence Learning
    • Attention is All You Need


A Comprehensive Introduction to Torchtext

Torchtext Github

8 10/19 & 10/21

Sequence to Sequence Model: Encoder-Decoder



Attention Model

Neural Machine Translation By Jointly Learning to Align and Translate

Attention: Illustrated Attention

PyTorch:
  • pytorch-seq2seq
    • Sequence to Sequence Learning with Neural Networks
    • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
    • Neural Machine Translation by Jointly Learning to Align and Translate
    • Packed Padded Sequences, Masking, Inference and BLEU
    • Convolutional Sequence to Sequence Learning
    • Attention is All You Need
9
10/26 & 10/28 Transformer
Self Attention: Attention is All you need

The Illustrated Transformer


PyTorch:
The Annotated Transformer

10




11/2 & 11/4

BERT (Bidirectional Encoder Representations from Transformers)



Transformers by Huggingface:

Quick Tour
Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Sumarization, and Translation

BERT Fine Tuning
BERT Fine-Tuning Tutorial with PyTorch

BERT Word Embeddings

Transformers by Huggingface and Full Documentation

11 11/9 & 11/11 Transformers by Huggingface:

Quick Tour
Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Sumarization, and Translation
Transformers by Huggingface and Full Documentation



12 11/16 & 11/18 Sentence Embedding With Transformers

Sentence-BERT: Sentence Embeddings using Siamese-Networks

13 11/23 & 11/25 Transformer-based Applications

Semantic Search with Transformers
  • Introducing txtai, an AI-Powered Search engine on Transformers

Similarity Search with FAISS

Question-Answering with Transformers

 




Group Projects and Presentations
14 11/30 & 12/2 Transformers by Huggingface For Korean

Naver Sentiment Movie Corpus
KorNLI
KorQuAD
KoreanNERCorpus
Naver NLP Challenge NER
NLP Challenge SRL

Group Projects and Presentations
15 12/7 & 12/9 Final Test and Project Presentations