MSDS IST 664: Natural Language Processing
Natural Language Processing
MSDS - Q4: IST664
Description:
Linguistic and computational aspects of natural language processing technologies. Lectures, readings, and projects in the computational techniques required to perform all levels of linguistic processing of text.
Course Description:
This course is designed to develop an understanding of how natural language processing (NLP) can process written text and produce a linguistic analysis that can be used in other applications. This goal will be achieved by:
- Readings, lectures, and class discussions of the multiple levels of linguistic analysis required for a computer to accept natural language input, interpret it, and carry out a particular application.
- Lab exercises and assignments in using some of the computational techniques required to perform these levels of natural language processing of text.
- Studies of realworld applications that incorporate substantive NLP modules.
The course primarily covers the techniques of NLP in the levels of linguistic analysis, going through tokenization, wordlevel semantics, partofspeech tagging, syntax, semantics, and on up to the discourse level. It also includes the use of the NLP techniques, such as information retrieval, question answering, sentiment analysis, summarization and dialogue systems, in applications.
Deliverables
Class Outline
Assessments
- Homework 1: Corpus Statistics and Mutual Information
- Homework 2: Regular Expressions
- Homework 3: Context-Free Grammars
- Final Project: Classification of Text
- NLP Investigation
Week 1 | Introduction to NLP and Processing Text Words
- 1.1 Readings
- 1.2 Resources
- 1.3 Introduction to NLP
- 1.4 Corpus Words
- 1.5 Software Installation: Python 3.x and NLTK
- 1.6 Lab Session: Processing Text Words in NLTK
Week 2 | Corpus Statistics and Language Modeling
- 2.1 Readings
- 2.2 Resources
- 2.3 Corpus Statistics: Unigrams (Words), Bigrams, and Mutual Information
- 2.4 Language Modeling
- 2.5 Lab Session: Bigram Frequencies and Mutual Information
Week 3 | Regular Expressions, Morphology, and Processing Text Files
- 3.1 Readings
- 3.2 Resources
- 3.3 Regular Expressions
- 3.4 Morphology
- 3.5 Processing Text
- 3.6 Lab Session: Processing Text and Stemming
- 3.7 Lab Session: Regular Expressions and Tokenization
Week 4 | POS tagging and Intro to ML
- 4.1 Readings
- 4.2 Resources
- 4.3 What Are Part-of-Speech (POS) Tags?
- 4.4 POS Tagging: How to Assign Tags
- 4.5 Machine Learning (ML) Introduction
- 4.6 Lab Session: POS tagging in the NLTK
Week 5 | Context Free Grammars (CFG) and Parsing
- 5.1 Readings
- 5.2 Resources
- 5.3 Context-Free Grammars
- 5.4 Parsing
- 5.5 Parsing Details
- 5.6 Lab Session: Parsing with Simple CFG
Week 6 | Semantics
- 6.1 Readings
- 6.2 Resources
- 6.3 Lexical Semantics
- 6.4 Semantics
- 6.5 Semantic Role Labeling
- 6.6 Lab Session: WordNet
Week 7 | Discourse and Dialogue
- 7.1 Readings
- 7.2 Resources
- 7.3 Discourse Introduction and Cohesion
- 7.4 Discourse Coreference and Structures
- 7.5 Dialogue and Planning
- 7.6 Lab Session: Classification in NLTK
Week 8 | Sentiment Analysis
- 8.1 Readings
- 8.2 Resources
- 8.3 Sentiment Introduction
- 8.4 Sentiment Analysis
- 8.5 Opinion Analysis and Other Topics
- 8.6 Lab Session: Sentiment Classifications in the NLTK
Week 9 | NLP Applications: IE, MT and Summarization
- 9.1 Information Extraction (IE)
- 9.2 Machine Translation (MT)
- 9.3 Summarization
- 9.4 Lab Session: More Features and Evaluation for Classification
Week 10 | NLP Applications: IR, QA and Conversational Agents
- 10.1 Information Retrieval (IR)
- 10.2 Question Answering (QA)
- 10.3 Conversational Agents
- 10.4 Lab Session: Final Projects
- 10.5 Final Projects
Week 11 | Final Project
- 11.1 (Optional) Deep Learning for NLP