MSDS IST 664: Natural Language Processing

2 minute read

Natural Language Processing

MSDS - Q4: IST664

Description:

Linguistic and computational aspects of natural language processing technologies. Lectures, readings, and projects in the computational techniques required to perform all levels of linguistic processing of text.

Course Description:

This course is designed to develop an understanding of how natural language processing (NLP) can process written text and produce a linguistic analysis that can be used in other applications. This goal will be achieved by:

  • Readings, lectures, and class discussions of the multiple levels of linguistic analysis required for a computer to accept natural language input, interpret it, and carry out a particular application.
  • Lab exercises and assignments in using some of the computational techniques required to perform these levels of natural language processing of text.
  • Studies of real­world applications that incorporate substantive NLP modules.

The course primarily covers the techniques of NLP in the levels of linguistic analysis, going through tokenization, word­level semantics, part­of­speech tagging, syntax, semantics, and on up to the discourse level. It also includes the use of the NLP techniques, such as information retrieval, question answering, sentiment analysis, summarization and dialogue systems, in applications.

Deliverables

Class Outline

Assessments

  • Homework 1: Corpus Statistics and Mutual Information
  • Homework 2: Regular Expressions
  • Homework 3: Context-Free Grammars
  • Final Project: Classification of Text
  • NLP Investigation

Week 1 | Introduction to NLP and Processing Text Words

  • 1.1 Readings
  • 1.2 Resources
  • 1.3 Introduction to NLP
  • 1.4 Corpus Words
  • 1.5 Software Installation: Python 3.x and NLTK
  • 1.6 Lab Session: Processing Text Words in NLTK

Week 2 | Corpus Statistics and Language Modeling

  • 2.1 Readings
  • 2.2 Resources
  • 2.3 Corpus Statistics: Unigrams (Words), Bigrams, and Mutual Information
  • 2.4 Language Modeling
  • 2.5 Lab Session: Bigram Frequencies and Mutual Information

Week 3 | Regular Expressions, Morphology, and Processing Text Files

  • 3.1 Readings
  • 3.2 Resources
  • 3.3 Regular Expressions
  • 3.4 Morphology
  • 3.5 Processing Text
  • 3.6 Lab Session: Processing Text and Stemming
  • 3.7 Lab Session: Regular Expressions and Tokenization

Week 4 | POS tagging and Intro to ML

  • 4.1 Readings
  • 4.2 Resources
  • 4.3 What Are Part-of-Speech (POS) Tags?
  • 4.4 POS Tagging: How to Assign Tags
  • 4.5 Machine Learning (ML) Introduction
  • 4.6 Lab Session: POS tagging in the NLTK

Week 5 | Context Free Grammars (CFG) and Parsing

  • 5.1 Readings
  • 5.2 Resources
  • 5.3 Context-Free Grammars
  • 5.4 Parsing
  • 5.5 Parsing Details
  • 5.6 Lab Session: Parsing with Simple CFG

Week 6 | Semantics

  • 6.1 Readings
  • 6.2 Resources
  • 6.3 Lexical Semantics
  • 6.4 Semantics
  • 6.5 Semantic Role Labeling
  • 6.6 Lab Session: WordNet

Week 7 | Discourse and Dialogue

  • 7.1 Readings
  • 7.2 Resources
  • 7.3 Discourse Introduction and Cohesion
  • 7.4 Discourse Coreference and Structures
  • 7.5 Dialogue and Planning
  • 7.6 Lab Session: Classification in NLTK

Week 8 | Sentiment Analysis

  • 8.1 Readings
  • 8.2 Resources
  • 8.3 Sentiment Introduction
  • 8.4 Sentiment Analysis
  • 8.5 Opinion Analysis and Other Topics
  • 8.6 Lab Session: Sentiment Classifications in the NLTK

Week 9 | NLP Applications: IE, MT and Summarization

  • 9.1 Information Extraction (IE)
  • 9.2 Machine Translation (MT)
  • 9.3 Summarization
  • 9.4 Lab Session: More Features and Evaluation for Classification

Week 10 | NLP Applications: IR, QA and Conversational Agents

  • 10.1 Information Retrieval (IR)
  • 10.2 Question Answering (QA)
  • 10.3 Conversational Agents
  • 10.4 Lab Session: Final Projects
  • 10.5 Final Projects

Week 11 | Final Project

  • 11.1 (Optional) Deep Learning for NLP

Tags:

Updated: