Studies in Computational Linguistics I

Course Description

이 과목에서는 자연언어처리(Natural Language Processing) 또는 컴퓨터언어학(Computational Linguistics)의 이론적인 기초에서부터 최근의 Transformers, BERT, chatGPT 기반의 방법론을 학습한다. 강의 전반부에서는 N-gram, Entropy, Embedding에 관한 내용이 다루어지며 후반부에는 Encoder-Decoder, Attention, Transformer를 학습하고 Huggingface의 Transformers의 사전학습모델과 모듈을 사용하여 자연언어처리에 활용하는 다양한 태스크를 실제 구현해 보도록 한다. 프로그래밍으로 Pytorch가 다루어지며 모든 과제는 토치를 기반으로 구현하도록 한다. 파이선 및 딥러닝 기본 지식이 요구된다. 이 수업을 통해 자연언어처리의 기본개념에서부터 최근의 방법론까지 학습하여 실제 언어처리에 활용할 수 있는 능력을 키우도록 한다.

Updates

Useful Sites

Lectures

PyTorch

Other Resources

Jupyter notebook

Jupyter notebook for beginners-A tutorial

Bring the best out of Jupyter notebooks for Data science-Enhance jupyter notebook’s productivity with these tips & tricks

Jump out of the Jupyter Notebook with nbconvert

Jupyter Notebook Extensions

Google Colabatory

Textbook and Sites

speech and Language Processing 3rd
Edition Drafts

Speech and Language Processing (3rd ed. Draft)

huggingface transformers

Huggingface Transformers

DL wizard

Deep Learning Tutorials based on PyTorch

Syllabus

	Date	Topics	Related Materials and Resources	PyTorch
1	9/4	Introduction to Natural Language Processing Language Modeling 1- Statistical Language Modeling: N-Grams	Natural Language Processing is Fun! Language Modeling and with N-Grams	PyTorch: Deep Learning With PyTorch: A 60 Minute Blitz Learning PyTorch with Examples Matrices PyTorch 실습
2	9/11	Language Modeling 1- Statistical Language Modeling: Entropy and Maximum Entropy Models	Entropy is a Measure of Uncertainty
3	9/18	Text Classification	Text Classification
4	9/25	Vector Semantics Language Modeling II: Static Word Embedding	Vector Semantics and Embeddings	PyTorch: Linear Regression With PyTorch Logistic Regression With PyTorch
5	10/2	Language Modeling II: Static Word Embedding	Vector Semantics and Embeddings	PyTorch: Word Embeddings: Encoding Lexical Semantics
6	10/9	Sequence to Sequence Model: Encoder-Decoder		PyTorch: pytorch-seq2seq Sequence to Sequence Learning with Neural Networks Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Neural Machine Translation by Jointly Learning to Align and Translate Packed Padded Sequences, Masking, Inference and BLEU Convolutional Sequence to Sequence Learning Attention is All You Need A Comprehensive Introduction to Torchtext Torchtext Github
7	10/16	Attention Model Neural Machine Translation By Jointly Learning to Align and Translate	Attention: Illustrated Attention	PyTorch: pytorch-seq2seq Sequence to Sequence Learning with Neural Networks Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Neural Machine Translation by Jointly Learning to Align and Translate Packed Padded Sequences, Masking, Inference and BLEU Convolutional Sequence to Sequence Learning Attention is All You Need
8	10/23	Transformer Self Attention: Attention is All you need	The Illustrated Transformer The Transformer	PyTorch: The Annotated Transformer
9	10/30	Language Modeling III: Dynamic Word Embedding : BERT (Bidirectional Encoder Representations from Transformers) The Illustrated BERT, ELMo, and co. (How NLP cracked Transfer Learning)	BERT Fine Tuning BERT Fine-Tuning Tutorial with PyTorch BERT Word Embeddings
10	11/6	Pre-trained Models and Transfer Learning Pre-trained Models: Past, Present and Future	Masked Language Models	XLM-R: Unsupervised Cross-lingual Representation Learning at Scale XLNet: Generalized Autoregressive Pretraining for Language Understanding MASS: Masked Sequence to Sequence Pre-training for Language Generation BART:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension GLM: All NLP Tasks Are Generation Tasks: A General Pretraining Framework SpanBERT: Improving Pre-training by Representing and Predicting Spans
11	11/13	Transformers by Huggingface: Quick Tour Summary of Tasks : Sequence Classification, Extractive Question Answering, Language Modeling, Text Generation, Named Entity Recognition, Sumarization, and Translation	Introduction to Huggingface Course
12	11/20	Large Language Models (LLMs) Background for LLMs Technical Evolution of GPT-series Models Resources of LLMs Pre-Training Adaptation of LLMs: Instruction Tuning/Alignment Tuning Utilization: In-Context Learning/Chain-of-Thought Prompting Capacity Evaluation Practical GuideBook of Prompt Design Applications	Large Language Models
13	11/27	Large Language Models (LLMs) Background for LLMs Technical Evolution of GPT-series Models Resources of LLMs Pre-Training Adaptation of LLMs: Instruction Tuning/Alignment Tuning Utilization: In-Context Learning/Chain-of-Thought Prompting Capacity Evaluation Practical GuideBook of Prompt Design Applications
14	12/4	Large Language Models For Korean
15	12/11	Final Test and Project Presentations

108.535A: 컴퓨터언어학연구 I

Course Description

Updates

Useful Sites

Textbook and Sites

Syllabus